Features
Uncover Data Brilliance : Embrace Sparkflows for Analytics Excellence
Generative AI
Gen AI App Framework
Hugging Face Integration
Cloud-LLM Integration
AWS Bedrock
Azure OpenAI
Copilot
Content Synthesis
Google Gemini
RAG & Vector DB Support
ChatGPT
Gen AI Solution Patterns
Nvidia
Content Generation
Database reporting
OCR
Chat agent
Visual Application Development
Build workflows by dragging and dropping
Rich collection of 500+ Processors
View results of previous runs
Machine Learning
Classification / Clustering / Regression
Collaborative Filtering
Save/Load Model / Predict
Cross Validator
Forecasting / Deep Learning
What If
Machine Learning Engines
SparkML
H2O
Tensorflow
Scikit Learn
XGBoost
Data Preparation
Prepare Data Seamlessly
Connect to various Sources & Sinks
Filter Data, Joins, Groups, Data Validation, Impute etc.
Connect to various Sources & Sinks
Batch sources : HDFS, Apache HIVE, Amazon S3
Streaming sources : Kafka, Flume
NoSQL sources : HBase, Solr, ElasticSearch
File Formats
Work with a variety of file formats including CSV/TSV, Avro, Parquet, JSON.
Intelligent Schema Inference for the various Datasets
NLP/OCR
Perform NLP on large scale data with Apache OpenNLP & StanfordNLP
Perform OCR with Tesseract
Multi-tenancy & User Management
Users can share Datasets and Workflows with groups
Create users with different roles & permissions
LDAP Integration
Visualization
View output of workflows as Linechart, Histogram, Barchart
View Random forests visually
Feature Generation
Tokenization
TF-IDF, One Hot Encoder
String Indexer, Impute, Scaler
Developer Toolkit
Add code using SQL, Scala, Jython nodes
Develop custom Nodes and have them available in Workflows
REST API's
Access Sparkflows with a rich set of REST API's
Workflows/Datasets/Dashboards/Execute Workflows/Access Result of Execution/Browse HDFS/Browse HIVE
Dashboards
Assemble the output of various workflows and nodes into a Dashboard
Build Dashboards from Relational Sources, adding filtering & drill down capabilities
Workflow Scheduling
Schedule workflows to be run a various time of the day/week/month
Trigger workflows by events in a Kafka topic.
Streaming Analytics
Connect to Apache Kafka, Apache Flume, Sockets, Twitter
Perform Streaming Analytics
Load results into Apache HBase, Apache Solr, Elastic Search etc
MLOps
Register
Deploy to Endpoints
Monitor
Track Feature Drift
Auto Retrain
Alert & Notifications