
Data Preparation
Sparkflows enables users to build data pipelines via 100+ pre-built processors and custom code processors to validate data, transform data, and clean it so that it can be consumed to build machine learning models.
Sparkflows has push down analytics built into the core architecture, so one need not worry about pulling data from different sources. The processing will happen where the data resides and hence the data governance constraints are met.

Data cleaning and validation
Sparkflows enables users to build data pipelines via 100+ pre-built processors and custom code processors to validate data, transform data, and clean it so that it can be consumed to build machine learning models.
Sparkflows has push down analytics built into the core architecture, so one need not worry about pulling data from different sources. The processing will happen where the data resides and hence the data governance constraints are met.
Data Connectors
Sparkflows supports 30+ file formats to be read/write via the pre-built processors, each of the file types having their own advantages and scenarios in which they fit in well.
Sparkflows supports connectors to 20+ data sources to enable reading/writing data like Amazon Redshift, Amazon S3, Databricks DBFS, Snowflake, SQL Databases, Google Big Query among others.
Sparkflows supports reading/writing data to/from streaming endpoints and queues like Apache Kafka, most of the queues like RabbitMQ, Twitter firehose among others.


Statistical data preparation
Sparkflows has pre-built processors to enable data cleaning, statistical imputations, data enrichments via processors like Join, Union, Filter, Group among others. The data can be prepared and cleaned to be fed to downstream data processes.
A coder can also write their custom logic in python,scala, jython and plug the logic in as a node in a matter of minutes. These processors make the system extensible and give Sparkflow the pedigree to tackle super complex data prep pipelines.
Push down data preparation at scale
The entire data preparation pipeline runs as a Workflow in Sparkflow which is versioned and each Execution is tracked. These pipelines are run in a push down manner and data is not pulled onto a centralized location. This enables Sparkflows to prepare and process Petabytes of data residing in data lakes.


Analytical Apps
Sparkflows enables the coders to build the data engineering pipeline, feature engineering pipeline, machine learning model and finalize the one that performs the best and then abstract away all the details into a simplified UI form based Application to be used by non-coders and business users....
Collaboration
Sparkflows enables the users to work in collaboration with other team members via the share feature which is tied to a project. A business user can create and define a use case in a sparkflows project, admin will give access to data required for the use case, data engineer can then build a data engineering pipeline...
Deployment options and Integrations
Sparkflows can be deployed either on-premise or in any cloud. Sparkflows has deep integrations with Amazon Web Services, Azure, Databricks, Google Cloud and Snowflake. While being deployed at any of the aforementioned environments