Saving and Loading a ML Model
Sparkflows has a number of Nodes for Machine Learning. These Nodes generate models. The models can be saved to files and then read back later for using.
Here we create a KMeans Model, save it to a directory, read it back and use it for predictions.
Since the models are persisted to files, they can be used in other Workflows and Spark programs too.
Below is the workflow for the purpose.
The workflow in summary does the following:
Reads in a dataset and assembles the features.
Splits the dataset into (.8, .2)
Performs KMeans clustering.
Saves the KMeans Model and then reads it back
Performs prediction on 20% of the dataset
Finally prints 10 rows of the predictions
The Saving and Loading of the Model is for demonstration purpose. In this case, the model does not need to be saved and loaded.
Below we see the configurations for the important Nodes. We also see that the schema is passed on from one Node to the next. Some of the Nodes also end up updating the Schema.
The model gets saved in the directory ‘modelsavepath’
Executing the Workflow
We next execute the workflow.
The model gets saved as parquet files as shown below: