Saving and Loading a ML Model

Sparkflows has a number of Nodes for Machine Learning. These Nodes generate models. The models can be saved to files and then read back later for using.

 

Here we create a KMeans Model, save it to a directory, read it back and use it for predictions.

 

Since the models are persisted to files, they can be used in other Workflows and Spark programs too.

Workflow

 

Below is the workflow for the purpose.

The workflow in summary does the following:

 

  • Reads in a dataset and assembles the features.

  • Splits the dataset into (.8, .2)

  • Performs KMeans clustering.

  • Saves the KMeans Model and then reads it back

  • Performs prediction on 20% of the dataset

  • Finally prints 10 rows of the predictions

 

The Saving and Loading of the Model is for demonstration purpose. In this case, the model does not need to be saved and loaded.

 

Below we see the configurations for the important Nodes. We also see that the schema is passed on from one Node to the next. Some of the Nodes also end up updating the Schema.

The model gets saved in the directory ‘modelsavepath’

Executing the Workflow

 

We next execute the workflow.

Input Data

KMeans Clusters

The model gets saved as parquet files as shown below:

 

./data

./data/._common_metadata.crc

./data/._metadata.crc

./data/._SUCCESS.crc

./data/.part-r-00000-f7c21f7e-2a82-4593-937e-a73b7173db4e.gz.parquet.crc

./data/_common_metadata

./data/_metadata

./data/_SUCCESS

./data/part-r-00000-f7c21f7e-2a82-4593-937e-a73b7173db4e.gz.parquet

./metadata

./metadata/._SUCCESS.crc

./metadata/.part-00000.crc

./metadata/_SUCCESS

./metadata/part-00000


 

Predictions

SUPPORT

For support please email:

SOCIAL

  • facebook
  • linkedin
  • twitter
  • angellist
© 2019 Sparkflows, Inc. All rights reserved. 

Terms and Conditions

image00.png