top of page

Overview

Organizations across multiple industries are in pursuit of building powerful Recommender Systems, which aim to provide great recommendations to their users at any time.

However, building successful Recommender Systems has been extremely challenging for several reasons. Cleaning the incoming datasets, joining very different datasets, further enriching these datasets, building big data machine learning models for predicting recommendations, and loading these models and associated datasets into serving stores like Apache HBase, and Apache Cassandra becomes complex requiring a lot of data processing, coordination, and orchestration. Additionally, handling NRT data adds even more complexity to an already complex problem.

Sparkflows solves it fluently by allowing each of the above steps to be done with pre-built Connectors, Processors, and Workflow.  In addition to standard connectors and processors, Sparkflows provides streaming workflows for processing NRT streaming data for processing and loading it into HBase, etc. Thus, pipelines are built and tested in order of hours instead of weeks.

Sparkflows has built-in support for machine learning to test various ML models, calculate results and load them into HBase, etc., for serving. Thus it smoothly supports Lambda Architecture for incorporating both Batch and Streaming to get great results!

choices.gif
confusion.gif
user.gif
It is extremely important to recommend the right things at the right time to every person

There are way too many options today

Everyone is Loaded

But everyone consumes only selected things

What Do Consumers
Expect?

Consumers expect systems (including websites) to be highly Intelligent, understand their needs, and Recommend products and services they would like at that specific time. Consumers love systems that can read their minds and make their engagements seamless.

recommendation-datasets.png

How do Consumers Consume Recommendations?

There are several kinds and contexts in which consumers consume Recommendations.

Several datasets from a wide variety of systems are used to Predict Recommendations that will drive sales and engagement.

However, Building end-to-end Powerful Recommendations Systems is Extremely Complex

Challenges

distributed.png
Distributed Systems

Data from too many Systems need to be connected. Handling various file formats, images, etc., get too daunting

algorithm.png
Complex Jobs for Data Enriching

Acquiring, Cleaning, Combining and Enriching Big Data is very complex

performance.png
Building Jobs for Performance

Building Big Data Batch & Streaming Jobs for Performance is very hard

telescope.png
Predicting Recommendations

Algorithms and Predicting Recommendation with Big Data gets very challenging

operation.png
Operationalization

Operationalizing the distributed system end to end quickly becomes complex

link.png
Team Size

Most teams are not large enough to build and operationalize these end to end complex Big Data Systems

Various Kinds of Recommender Systems

funnel.png
Collaborative Filtering

People who agreed in the past will agree in the future

filter.png
Content Based Filtering

Recommend Items similar to what the user liked in the past

idea.png
Hybrid Recommendations

Collaborative Filtering + Content Based

population.png
Frequent Parallel Growth

Find Items which are frequently bought together

new.png
Simple Aggregates

Top NMost Popular, Recent Uploads

searching.png
Search Based

Using Search Engines

Recommender Systems can be build on Sparkflows quickly, using the pre-built connectors and processors

sparkflows-recommender-flow.png

Recommender System powered by Sparkflows

Sparkflows powers each step of Building Recommender System. Building the Recommender System is a highly iterative complex process with many people involved in building it.

Hence, it becomes immensely difficult to build them out.

Sparkflows makes it seamless to power each step of the process. It makes it easy for anyone to understand and update the system at any point of time.

customer-360-read-data.png

Step 1: Choose your data source

Sparkflows supports a variety of data sources both batch and streaming.

 

Connectors for CSV, Apache Kafka, JDBC, Markato, MongoDB, Apache HBase, etc., are available out of the box. You will need to configure them to point to the right data source. 

clean-transform-enrich.png

Step 2: Clean, Transform, Combine and Enrich

Clean, Combine, Join, De-deduplicate, Transform and Enhance data with over 200+ pre-built processors. 

Step 3: Build Variety of Recommenders with Sparkflows

filter (1).png
Collaborative Filtering

Using ALS Processor

processor.png
Rich User & Item Profiles

Using the power of 190+ Processors

filter (2).png
Content Based Filtering

Using Similarity Processor & Clustering Processors

pattern.png
Frequent Pattern Mining

Using FP-Growth Processor

multitask.png
Top-N

Easily Compute various Aggregates

process.png
Stream Processing

Handle NRT data with seamless Streaming Processors

Step 4: Build Hybrid Recommender Systems

hybrid-recommender.png

Easily Combine results of various Recommender Systems build to get great results

customer-360-predictions.png

Step 5: Apply more ML/NLP

Enrich the user and item profiles with more ML/NLP in Sparkflows.

recommendation-power-apps.png

Step 6: Load Recommendations into Serving Stores & Power Intelligent Applications 

Load profiles into serving stores such as Apache HBase, Apache Cassandra and Elastic and power intelligent applications such as Personalization, Virtual Assistant, Proactive case, Demand prediction, Churn Prediction, Fraud detection etc., with ease.

recommendation-sparkflows-arch.png

Bringing it All Together

Sparkflows makes it seamless to build out the various Powerful Recommender Systems.

Sparkflows handles both the Streaming and Batch workloads thus enabling the Lambda Architecture. Process streams from Apache Kafka and load them into HBase/Solr, etc.

Process batch jobs, perform ML/NLP and load results into the serving stores.

Sparkflows Difference 

unnamed (57).png

10X Faster

Build out use cases in weeks instead of months with native connectors and processors

unnamed (58).png

Iterate Quickly

Iterate quickly with visual workflows and built-in version control

unnamed (59)_edited.png

Go Further

Go even further with built-in nodes for ML, MLP,  Sentiment analysis, etc.

bottom of page