Simplify collaboration among your Data Science project team using Sparkflows. Easily build advanced analytic models and be able to orchestrate the workflow of a pipeline with Sparkflows inbuilt tools. Monitor your production ML Models with retesting, review, and reevaluation made easy with Sparkflows.

With faster experimentation, deployments and an end-to-end lineage tracking, using Sparkflows improves the overall efficiency of ML workflow.

Model registry

Sparkflows enables the feature engineering pipeline and the ML model to be stored in a model registry, promoted to production environment, set triggers for drift detection and auto train the pipeline. The models can be seamlessly promoted from development environment to production.

Sparkflows enables integration with other model registries as well in addition to the native model registry like MLflow. It can be used to track and deploy the models in MLflow.

Group 57.png

Model formats

Sparkflows supports storing, registering, deployment of most of the open model formats like PMML, H2O MOJO, pickle files, MLeap.

Each of the above formats have their tradeoffs - some are platform independent, some have super fast scoring latency, some are more widely adopted. Sparkflows allows you to pick and choose the format in which you want to save the model provided the underlying algorithm supports the format.

Offline deployment, drift detection & retraining

Sparkflows enables the models to be deployed in offline mode wherein the model is used to score for batch jobs wherein the scoring is triggered either by alerts or manually. The scores for each request are versioned and stored in Sparkflows which can later be compared either visually(manually) or via drift detection processors

Sparkflows enables the deployment either by the use of the Studio UI or via the SDK.

Online deployment, drift detection & retraining

Sparkflows enables the models to be deployed in online mode as a docker container as well as a pod in kubernetes which can serve real time streaming scoring requests. The scores from these requests are versioned and stored. Sparkflows can automatically retrain the model if the model is seen to drift. The logic to compute drift in a model can be customized by writing a few lines of python code to suit the use case needs.

The scores returned can be published out to Kafka queue, written to disk, databases or pushed out to BI tools to build dashboards