Overview
Organizations across multiple industries are in pursuit of building powerful Recommender Systems, which aim to provide great recommendations to their users at any time.
However, building successful Recommender Systems has been extremely challenging for several reasons. Cleaning the incoming datasets, joining very different datasets, further enriching these datasets, building big data machine learning models for predicting recommendations, and loading these models and associated datasets into serving stores like Apache HBase, and Apache Cassandra becomes complex requiring a lot of data processing, coordination, and orchestration. Additionally, handling NRT data adds even more complexity to an already complex problem.
Sparkflows solves it fluently by allowing each of the above steps to be done with pre-built Connectors, Processors, and Workflow. In addition to standard connectors and processors, Sparkflows provides streaming workflows for processing NRT streaming data for processing and loading it into HBase, etc. Thus, pipelines are built and tested in order of hours instead of weeks.
Sparkflows has built-in support for machine learning to test various ML models, calculate results and load them into HBase, etc., for serving. Thus it smoothly supports Lambda Architecture for incorporating both Batch and Streaming to get great results!
Information Technology
It is extremely important to recommend the right things at the right time to every person
There are way too many options today
Everyone is Loaded
But everyone consumes only selected things
What Do Consumers
Expect?
Consumers expect systems (including websites) to be highly Intelligent, understand their needs, and Recommend products and services they would like at that specific time. Consumers love systems that can read their minds and make their engagements seamless.
How do Consumers Consume Recommendations?
There are several kinds and contexts in which consumers consume Recommendations.
Several datasets from a wide variety of systems are used to Predict Recommendations that will drive sales and engagement.
However, Building end-to-end Powerful Recommendations Systems is Extremely Complex
Challenges
Distributed Systems
Data from too many Systems need to be connected. Handling various file formats, images, etc., get too daunting
Complex Jobs for Data Enriching
Acquiring, Cleaning, Combining and Enriching Big Data is very complex
Building Jobs for Performance
Building Big Data Batch & Streaming Jobs for Performance is very hard
Predicting Recommendations
Algorithms and Predicting Recommendation with Big Data gets very challenging
Operationalization
Operationalizing the distributed system end to end quickly becomes complex
Team Size
Most teams are not large enough to build and operationalize these end to end complex Big Data Systems
Various Kinds of Recommender Systems
Collaborative Filtering
People who agreed in the past will agree in the future
Content Based Filtering
Recommend Items similar to what the user liked in the past
Hybrid Recommendations
Collaborative Filtering + Content Based
Frequent Parallel Growth
Find Items which are frequently bought together
Simple Aggregates
Top N, Most Popular, Recent Uploads
Search Based
Using Search Engines
Recommender Systems can be build on Sparkflows quickly, using the pre-built connectors and processors
Recommender System powered by Sparkflows
Sparkflows powers each step of Building Recommender System. Building the Recommender System is a highly iterative complex process with many people involved in building it.
Hence, it becomes immensely difficult to build them out.
Sparkflows makes it seamless to power each step of the process. It makes it easy for anyone to understand and update the system at any point of time.
Step 1: Choose your data source
Sparkflows supports a variety of data sources both batch and streaming.
Connectors for CSV, Apache Kafka, JDBC, Markato, MongoDB, Apache HBase, etc., are available out of the box. You will need to configure them to point to the right data source.
Step 2: Clean, Transform, Combine and Enrich
Clean, Combine, Join, De-deduplicate, Transform and Enhance data with over 200+ pre-built processors.
Step 3: Build Variety of Recommenders with Sparkflows
Collaborative Filtering
Using ALS Processor
Rich User & Item Profiles
Using the power of 190+ Processors
Content Based Filtering
Using Similarity Processor & Clustering Processors
Frequent Pattern Mining
Using FP-Growth Processor
Top-N
Easily Compute various Aggregates
Stream Processing
Handle NRT data with seamless Streaming Processors
Step 4: Build Hybrid Recommender Systems
Easily Combine results of various Recommender Systems build to get great results
Step 5: Apply more ML/NLP
Enrich the user and item profiles with more ML/NLP in Sparkflows.
Step 6: Load Recommendations into Serving Stores & Power Intelligent Applications
Load profiles into serving stores such as Apache HBase, Apache Cassandra and Elastic and power intelligent applications such as Personalization, Virtual Assistant, Proactive case, Demand prediction, Churn Prediction, Fraud detection etc., with ease.
Bringing it All Together
Sparkflows makes it seamless to build out the various Powerful Recommender Systems.
Sparkflows handles both the Streaming and Batch workloads thus enabling the Lambda Architecture. Process streams from Apache Kafka and load them into HBase/Solr, etc.
Process batch jobs, perform ML/NLP and load results into the serving stores.
Sparkflows Difference
10X Faster
Build out use cases in weeks instead of months with native connectors and processors
Iterate Quickly
Iterate quickly with visual workflows and built-in version control
Go Further
Go even further with built-in nodes for ML, MLP, Sentiment analysis, etc.