By Joe Clabby, Clabby Analytics
Someday, information systems will be able to program themselves using machine logic or by following human natural language commands. But until that day comes, anything that vendors can do to simplify the coding process in order to make it easier for developers and users to build new applications and to streamline process flows will be much appreciated by the business community. This is because programming simplification enables enterprises to use individuals with little technical/programming background to accomplish a variety of complex tasks.
For instance, consider the field of analytics. By simplifying the application development process, vendors make it possible for enterprises to use “non-techies” to perform: 1) descriptive analytics (what is happening?); 2) prescriptive analytics (what should I do about it?); 3) diagnostic analytics (why is something happening?); and, 4) predictive analytics (what is likely to happen?).
RapidMiner is a maker of a code-free, advanced analytics environment that simplifies query and analytics tasks. It features a machine learning environment – and it offers data mining, text mining, predictive analytics and business analytics facilities. According to RapidMiner, its products were “built by data scientists for data scientists, business analysts and developers”. Its user community now numbers over a quarter-of-a-million (these users make use of RapidMiner products more than 6 hours per day). The company has 600 in-production customers – of which 100 are paying customers. The company’s revenue stream is approximately 70% software product driven – and 30% services driven.
How It Works
The way that it works is that users select various “processes” (workflows), and choose their way through a variety of “operators” (work steps) within these processes. Within these processes, operators do things such as create loops or conditional branches to control process flow; provide utilities for sub-processes; enable easy access to read/write data in repositories; enable data import/export; simplify data transformation; enable modeling(data mining processes); and offer facilities that can compute the quality of a model. If the user understands how to structure a query and flow work, then building that query and tying it into a variety of data sources and then modeling that query becomes a series of point-and-click template choices. The underlying “code” that talks to the system/database is transparent.
A closer look at the RapidMiner product offering shows that it can interact with 450 data sources including prepackaged integration with open source Apache Solr (search facilities), as well as with Qlik (self-service data visualization & discovery – a leading business intelligence tool) and Mozenda (screen scraping, data scraping, data transformation). At present there are more than 1500 operators from which to choose within the RapidMiner environment – as well as 250 machine learning functions. With open source integration; with ties into other packages that assist in analytics activities; and with ties into multiple data sources – RapidMiner makes it possible to create all sorts of queries using a wide variety of tools on a broad array of data.
RapidMiner can be deployed in three configurations: 1) as an “in-memory” environment; 2) as a traditional in-database environment; and, 3) pushed down to execute inside Hadoop as part of a Hadoop Big Data environment. It can mine data within a traditional on-premises data center; or it can mine data from external cloud environments. Further, RapidMiner just announced that it can capture and analyzed streamed data – an important step when it comes to analyzing massive amounts of real-time data (where the data is streamed and analyzed but not kept in storage). This is a very big step for RapidMiner – it enables the company to compete in the Internet-of-Things (sensory data) and machine-to-machine communications markets – both fast-growing market segments where few other tools exist that can provide advanced analytics/modeling on large volumes of streamed data.
RapidMiner offers several product choices: 1) RapidMiner Studio (a desktop environment); 2) RapidMiner Server (server environment); 3) RapidMiner Streams; 4) Rapid Miner Cloud (for access to a RapidMiner cloud repository and cloud connectors); 5) RapidMiner Managed Server (a managed server environment); and, 6) RapidMiner Radoop (a Hadoop offering).
A closer look at RapidMiner’s new streaming product shows that it is integrated with Apache Storm for processing streaming data from social media, manufacturing, and wearable sensors, and other devices. Apache Storm operations can be mixed with the 1,500 RapidMiner operators (the operator concept was described earlier) to perform such actions as predictive analytic processes for real-time sentiment analysis, or to embed actions into manufacturing control systems, and/or to prescribe physical activities.
RapidMiner’s “Starter Edition” is available on the company’s Web site (here) as a free download. Full license prices range from $999 for a desktop environment, through $9,999 for a server environment.
Most people don’t want to know how to program computers, they want to be able to use computers as tools that can help achieve an end result. For instance, a genome researcher wants to understand how a gene sequence works – most genetic researchers don’t want to have to become computer programmers in order to achieve their analytics end results. If users are taught how to model and structure their queries – then tools from the RapidMiner product suite can help them flow processes and attach to the data services that they need to execute their queries. These tools are simple to use, enabling users to focus on what they like to do best – research and analysis – while leaving the programming tasks to their RapidMiner program. We’ve not quite arrived at the juncture in programming where computers listen to natural language commands and generate programs accordingly – but RapidMiner has taken us one step closer…