Sparkflows Features
Introduction to Generative AI
Sparkflows platform enables customers to use the Generative AI capabilities by hosting model and infrastructure in-house (on-prem or on the cloud within a VPC) and via API’s to licensed models like GPT-4.

Visual Application Development
Build workflows by dragging and dropping
Rich collection of 250+ Processors
View results of previous runs
Machine Learning
Classification / Clustering / Regression
Collaborative Filtering
Save/Load Model / Predict
Cross Validator
Machine Learning Engines
SparkML
H2O
Tensorflow
Scikit Learn
XGBoost
Data Preparation
Prepare Data Seamlessly
Connect to various Sources & Sinks
Filter Data, Joins, Groups, Data Validation, Impute etc.
Connect to various Sources & Sinks
Batch sources : HDFS, Apache HIVE, Amazon S3
Streaming sources : Kafka, Flume
NoSQL sources : HBase, Solr, ElasticSearch
File Formats
Work with a variety of file formats including CSV/TSV, Avro, Parquet, JSON.
Intelligent Schema Inference for the various Datasets
NLP/OCR
Perform NLP on large scale data with Apache OpenNLP & StanfordNLP
Perform OCR with Tesseract
Multi-tenancy & User Management
Users can share Datasets and Workflows with groups
Create users with different roles & permissions
LDAP Integration
Visualization
View output of workflows as Linechart, Histogram, Barchart
View Random forests visually
Feature Generation
Tokenization
TF-IDF, One Hot Encoder
String Indexer, Impute, Scaler
Developer Toolkit
Add code using SQL, Scala, Jython nodes
Develop custom Nodes and have them available in Workflows
REST API's
Access Sparkflows with a rich set of REST API's.
Workflows/Datasets/Dashboards/Execute Workflows/Access Result of Execution/Browse HDFS/Browse HIVE
Dashboards
Assemble the output of various workflows and nodes into a Dashboard
Build Dashboards from Relational Sources, adding filtering & drill down capabilities
Workflow Scheduling
Schedule workflows to be run a various time of the day/week/month
Trigger workflows by events in a Kafka topic.
Streaming Analytics
Connect to Apache Kafka, Apache Flume, Sockets, Twitter
Perform Streaming Analytics
Load results into Apache HBase, Apache Solr, Elastic Search etc.

The Sparkflows Experience

