Saving and Loading a ML Model

Sparkflows has a number of Nodes for Machine Learning. These Nodes generate models. The models can be saved to files and then read back later for using.


Here we create a KMeans Model, save it to a directory, read it back and use it for predictions.


Since the models are persisted to files, they can be used in other Workflows and Spark programs too.



Below is the workflow for the purpose.

The workflow in summary does the following:


  • Reads in a dataset and assembles the features.

  • Splits the dataset into (.8, .2)

  • Performs KMeans clustering.

  • Saves the KMeans Model and then reads it back

  • Performs prediction on 20% of the dataset

  • Finally prints 10 rows of the predictions


The Saving and Loading of the Model is for demonstration purpose. In this case, the model does not need to be saved and loaded.


Below we see the configurations for the important Nodes. We also see that the schema is passed on from one Node to the next. Some of the Nodes also end up updating the Schema.

The model gets saved in the directory ‘modelsavepath’

Executing the Workflow


We next execute the workflow.

Input Data

KMeans Clusters

The model gets saved as parquet files as shown below:


























Contact Us

© 2020 Sparkflows, Inc. All rights reserved. 


  • Facebook
  • LinkedIn
  • Twitter
  • angellist

Terms and Conditions