top of page

Self Serve Advanced Analytics with Sparkflows on AWS

Updated: Feb 1, 2022

Sparkflows provides an advanced Analytics Studio for self-serve analytics and AI. Users can easily perform complex data preparation, data profiling, advanced analytics and build machine learning models. Users can also build powerful reports and dashboards.

Sparkflows has partnered with AWS to bring Self-Serve Advanced Analytics to the users of AWS. Fire Insights is deeply integrated and certified on AWS. It can be connected to and execute jobs on AWS EMR, AWS Glue or also on a standalone machine. Sparkflows integrates with S3, Redshift, SageMaker, HIVE, Amazon MSK or Kinesis.





Figure 1. Sparkflows Analytics Studio on AWS





Figure 2. Sparkflows Integration with AWS Services



Sparkflows enables end to end Self-Serve Analytics providing the below features:





Figure 3. Sparkflows Features and Capabilities


Sparkflows enables building powerful workflows. The 345+ processors come together in the workflow.





Figure 4. Sparkflows Workflow


Build Models with a variety of ML engines






Sparkflows provides drag and drop processors for ML algorithms for a variety of ML engines. These get executed on the cluster, reading the data in a distributed fashion.


Integration with EMR and GLue


Sparkflows can connect to and run the workflows on AWS EMR or on AWS Glue. This enables the Jobs to seamlessly scale to Petabytes of data and support hundreds of users. Once the infrastructure is in place, Sparkflows enables the users to quickly get business value from their setup.


Integration with S3

Sparkflows allows you to access your files on S3. The jobs run by Fire can read from and write to files on S3. The files can be in various file formats including CSV, JSON, XML, Excel, Parquet, Avro etc.​ Fire also allows you to browse your files on S3, create folder, upload and download files.

Integration with Redshift

Fire is fully integrated with Redshift. Fire has processors for reading from and writing to Redshift. Users can also browse the Redshift databases and tables.

Integration with Sagemaker


Fire is fully integrated with AWS SageMaker. Fire provides a number of processors for doing model building with SageMaker.

The data preparation and feature engineering jobs would run on the EMR or Glue cluster. The Sagemaker model nodes then connect with and run on Sagemaker. This forms a very powerful combination for end to end Machine Learning.



Figure 5. Sparkflows - Sagemaker Integration Architecture


Integration with Glue


Fire can submit the Analytical Jobs to be run on AWS Glue. The results and visualizations are displayed back in Fire Insights.


Summary


Sparkflows makes it seamless for you to get value from your Data Lakes, EMR Clusters, data on S3, Redshift etc.


Users can just log in and easily focus on their use cases. Sparkflows brings in the Enterprise Scale and Security. Users can collaborate and work in groups on their projects.


Users can easily connect to data sources, read the data, do data preparation, data profiling and model building. All of this running at scale on the EMR and Glue clusters.











90 views0 comments
bottom of page