top of page

Machine Learning with Sparkflows


Sparkflows allows to easily do machine learning with Spark MLlib.

The details of Apache Spark MLlib are captured in detail in the Programming Guide : http://spark.apache.org/docs/latest/ml-guide.html

As we see, Apache Spark provides various Classification & Regression Algorithms, Clustering Algorithms etc.

  • http://spark.apache.org/docs/latest/ml-classification-regression.html

  • http://spark.apache.org/docs/latest/ml-clustering.html

  • Classification : Logistic Regression, Decision Tree Classifier, Random Forest Classifier etc.

  • Regression : Linear Regression, Decision Tree Regression

Machine Learning Nodes in Sparkflows

Sparkflows captures the various Spark MLlib Nodes under various categories.


Within each of the categories, we see there a number of available Nodes or Building Blocks.


When using a node, say LogisticRegression, we are provided with a dialog box in which to set the various parameters.



Below is a workflow for spam detection using Logistic Regression.


Executing the workflow

The workflow can be executed on a Spark cluster.


When the workflow executes, the output of the nodes is streamed back to the browser and displayed.


Viewing past results of execution

The results of the past executions can be viewed. This allows us to view the past models and scores. This is valuable in terms of analyzing how our models have evolved over time.



208 views0 comments
bottom of page