Sparkflows enables users to build data pipelines via 100+ pre-built processors to validate data, transform data and have clean data prepared. Users can even extend the processors by leveraging our sdk as well.
Sparkflows enables push down analytics and is built into the core architecture as a result of which the processing happens where the data resides resulting in easy data governance.
Sparkflows has rich statistical interactive visualization capabilities via plots like correlation matrix, boxplot, subplot, histograms, graph plots enabling quick insights into the data.
Sparkflows enables auto data profiling by computing meta information like columns cardinality, correlations, distinct values in columns, flagging outliers among others.
Sparkflows enables users to build Machine Learning models via 150+ processors which perform well out of the box via its statistical and domain based feature engineering processors to go along with the multiple AutoML engines for both supervised learning and unsupervised learning use cases.
Sparkflows enables users with and without data science expertise as well as coders and non-coders alike.
Sparkflows enables the feature engineering pipeline and the ML model to be stored in a model registry, promoted to production environment, set triggers for drift detection and auto train the pipeline.
Sparkflows supports both offline/batch scoring using the models built as well as real time streaming scoring.
Sparkflows enables the coders to build the data engineering pipeline, feature engineering pipeline, machine learning model and finalize the one that performs the best and then abstract away all the details into a simplified UI form based Application to be used by non-coders and business users.
The Application abstracts away the complexities from the end user who sees the app as a solution to the use case.
Sparkflows enables the users to work in collaboration with other team members via the share feature which is tied to a project.
A business user can create and define a use case in a sparkflows project, an admin user will give access to data required, data engineer will then build a data engineering pipeline, hand it over to a Data Scientist who builds a model, validates, verifies it and marks it for production deployment, the MLOps admin picks up the model and deploys it to production.
Sparkflows has a unified administrative console which can be used to configure and manage users, clusters, data connections, collect usage statistics, see runtime statistics, check yarn applications, view server and execution logs, create audit reports.
Sparkflows can be configured to download the admin statistics and share it across for auditing, reporting.
Deployment options and Integrations
Sparkflows can be deployed either on-premise or in any cloud with the push-down processing. Sparkflows has deep integrations with Amazon Web Services, Azure, Databricks, Google Cloud and Snowflake.
Sparkflows runs seamlessly on vanilla Kubernetes, Amazon EKS, Amazon ECS, Google Kubernetes Engine, Azure Kubernetes Service along with Elastic scaling capabilities.