Sparkflows enables users to build Machine Learning models via 150+ processors which perform well out of the box via its statistical and domain-based feature engineering processors to go along with the multiple AutoML engines for both supervised learning and unsupervised learning use cases.
Sparkflows fully enables users with and without data science expertise as well as coders and non-coders alike.
Sparkflows prebuilt processors creates statistical and
domain-based features like imputation, vectorizer, tokenizer, feature scaler, polynomial expansion, normalizer, extract month from date, extract year from timestamp among others.
Applying Feature Engineering in Sparkflows use selection, construction, transformation, and extraction techniques to provide the key features from your raw data.
ML Engines, Models and AutoML
Sparkflows provides a variety of ML engines which can be leveraged to build models like Apache Spark, H2O, Keras, Prophet, Pycaret, Scikit Learn among others.
Users can leverage the AutoML processors of Sparkflows to build quick models which do fairly well in the real world. These models can be extracted out and optimized for tradeoff between Accuracy and Interpretability to suit the use case needs.
Sparkflows can be used to build supervised, unsupervised ML models on tabular data, supervised ML models on time series data, Deep Learning models on unsupervised data.
Code in your features and ML models
Sparkflows provides the processors to write in custom code in Pyspark, Python, or Scala to create new features for custom feature engineering needs in addition to the prebuilt processors.
Sparkflows also enables users to build custom models by writing code in pyspark, python, or scala to build custom models like U-net in Keras, use any new under research library to try new approaches to build the predictive model.
Push down ML modeling at scale
Sparkflows can leverage the ML Engines like Apache Spark, H2O and Keras to perform push down ML model training and get back only the models to be saved in the Model registry. This approach minimizes the data transfer overhead and enables Sparkflows to build ML models on petabytes of data.