Sparkflows allows to easily do machine learning with Spark MLlib.
The details of Apache Spark MLlib are captured in detail in the Programming Guide : http://spark.apache.org/docs/latest/ml-guide.html
As we see, Apache Spark provides various Classification & Regression Algorithms, Clustering Algorithms etc.
Classification : Logistic Regression, Decision Tree Classifier, Random Forest Classifier etc.
Regression : Linear Regression, Decision Tree Regression
Machine Learning Nodes in Sparkflows
Sparkflows captures the various Spark MLlib Nodes under various categories.
Within each of the categories, we see there a number of available Nodes or Building Blocks.
When using a node, say LogisticRegression, we are provided with a dialog box in which to set the various parameters.
Below is a workflow for spam detection using Logistic Regression.
Executing the workflow
The workflow can be executed on a Spark cluster.
When the workflow executes, the output of the nodes is streamed back to the browser and displayed.
Viewing past results of execution
The results of the past executions can be viewed. This allows us to view the past models and scores. This is valuable in terms of analyzing how our models have evolved over time.