top of page

Overview

Apache Spark is being extensively used to power computing today in most organizations. Apache Spark can process Streaming and Batch Data, and perform complex ETL and Machine Learning.

Providing Apache Spark as a Service is very powerful. This way various users can access the data and compute available and perform Analytics, ETL, and Machine Learning. However the challenge today is that users access Spark today by either writing complex Spark code or are limited to using SQL on Spark.

 

Sparkflows provides the perfect, most potent layer on Apache Spark to enable Self-Service. Users can now log in with their Browser, access their data, and immediately start performing complex computing and machine learning on Spark.

spark-as-a-service.png

Access Data and Compute through your Browser

With Sparkflows, users can simply log in through their browser. Adding new users to the system is very easy and it scales to hundreds of users.

Once users log in, they can immediately:

  • Start Browsing their data

  • Perform Data Prep and ETL

  • Perform Machine Learning

  • Visualize their data

  • Read data from their favorite systems and write data to systems.

self-serve-data-analytics.png

Perform Data Preparation, Analytics/ML with Ease

Sparkflows makes is really easy to perform Data Preparation & Powerful Analytics.

Sparkflows has a number of Processors for:

  • Reading data in a variety of formats.

  • Validation and preparing the data in various ways.

  • Performing Analytics and Machine Learning.

dedup.png

Perform Complex Dedup

Sparkflows makes it very easy to perform Complex Dedup.

Big Data is a place where many, many datasets come together from a variety of sources. Not all of the datasets are tied to each other with a unique key. Hence the Dedup becomes a common use case.

Sparkflows provides multiple powerful processors for performing dedup and matching of documents. They allow selecting from a variety of distance algorithms, applying different weights to various column, etc.

streaming-analytics.png

Perform Streaming Analytics

Building Streaming Analytics systems are in general very difficult.

Sparkflows makes is seamless to build out and run Streaming Applications. Read data from streaming sources Apache Kafka/Amazon Kinesis, etc., do the transforms/analytics, and save the results into the appropriate store.

Sparkflows enables building complex streaming jobs in minutes and running them immediately onto a Cluster.

Load Data into various Stores

With Sparkflows loading data into various stores is fast and simple. Simply connect the data to a Data Store Processor, configure it for the right parameters and you are done.

Sparkflows provides a number of connectors out of the box including:

  • Apache HIVE

  • Apache HBase

  • Elastic Search

  • Apache Solr

  • MongoDB

  • Apache Kafka etc.

If any connector is not available, contact us for it. You can also add your own connector to Fire Insights and start using it.

bottom of page