- Jun 8, 2023
- 2 min read

Sparkflows makes it extremely easy to implement a Continuous Machine Learning process within hours

Updated: Jun 9, 2023

This process ensures that Business Insights are generated in a timely manner against the most recent dataset. It also keeps the ML model up-to-date, detects anomalies, and merges daily changes with historical data.

Let’s assume we have created the necessary workflows for data preparation, model training, model prediction, and analytical reports.

Continuous Machine Learning can be implemented by creating a Training Pipeline and a Prediction Pipeline in Sparkflows.

The Model Training Pipeline can be scheduled to run periodically.

First the ‘Data Ingestion Workflow’ reads data from a data lake bucket which is regularly updated with incremental data.
Next the ‘Data Preparation Workflow’ prepares the ingested data and creates necessary features.
Finally the ‘Model Training Workflow’ selects the model features, trains and saves the model.

The Model Prediction Pipeline can be either scheduled to run periodically or directly invoked from another Cloud-hosted Service through API.

First the ‘Incremental Ingestion Workflow’ is triggered by either a scheduled run or an API Call that determines the location of the latest files for prediction input.
Next the ‘Data Preparation Workflow’ processes the latest data files and merges them with the ‘Training Input’ data so that the Training Workflow always runs against the latest dataset.
This workflow can also output the required dataset for prediction. For example, in order to predict the churning of customers we always need to find all the active customers from the latest data.
The ‘Model Prediction Workflow’ reads the processed prediction input data and saves the predictions in the output bucket of the data lake.
Finally ‘Predictive Analytics Workflow’ is executed to generate the required Business Insights which are instantly published into pre-defined Reports.