Machine Learning Models for Mere Mortals

In ITProPortal, Rafael Zubairov, Senior Architect at DataArt, explains how a business can start using machine learning without dedicated software engineers or data science experts, and reviews the tools that make it possible.

“Today we have free access to open source, closed-source, and cloud-based software products that help analysts with their daily tasks. These solutions combine simple functions and modern, intuitive interfaces for data ingestion and cleansing and model building functionality. Consequently, a wider range of business people are empowered to verify hypothesis and build models of medium complexity without the need to hire software developers.

Looking into what data science systems offer for preparing data, we can see simple methods of configuring the process of uploading the data from all available sources, whether a database, non - relational database, a file in a shared drive, as well as preliminary data processing. The latter may mean a variety of processes, including filling out empty data slots, and merging and joining of tables. Many of these functionalities can be found in RapidMiner, H2O, DataRobot.

Unfortunately, the majority (up to 80 per cent) of time in data analysis is consumed by data collection and cleansing. Processes that follow are not less important, but less strenuous - feature engineering, model selection and fine tuning are more intellectual, less labour intensive and automated to some level in products like DataRobot, or libraries under AutoML umbrella.

Creating machine learning and data science models in most of the tools is fairly simple. For example, H2O provides an interactive workbook with explanations of each model, ad-hoc suggestions and thorough documentation. Each step of data processing and modelling is represented as a step. RapidMiner on the other hand allows one to create a data flow graph where steps like data processing and model calculation are represented as boxes with configurable properties relative to the action.”

View original article here or download PDF.