top of page

Sparkflows Features

Introduction to Generative AI

Sparkflows platform enables customers to use the Generative AI capabilities by hosting model and infrastructure in-house (on-prem or on the cloud within a VPC) and via API’s to licensed models like GPT-4.

Visual Application Development​

​Build workflows by dragging and dropping

Rich collection of 250+ Processors

View results of previous runs

Machine Learning

Classification / Clustering / Regression

Collaborative Filtering

Save/Load Model / Predict

Cross Validator

Machine Learning Engines

SparkML

H2O

Tensorflow

Scikit Learn

XGBoost

Data Preparation

Prepare Data Seamlessly

Connect to various Sources & Sinks

Filter Data, Joins, Groups, Data Validation, Impute etc.

Connect to various Sources & Sinks

Batch sources : HDFS, Apache HIVE, Amazon S3

Streaming sources : Kafka, Flume

NoSQL sources : HBase, Solr, ElasticSearch

File Formats

Work with a variety of file formats including CSV/TSV, Avro, Parquet, JSON.

Intelligent Schema Inference for the various Datasets

NLP/OCR

​Perform NLP on large scale data with Apache OpenNLP & StanfordNLP

Perform OCR with Tesseract

Multi-tenancy  & User Management​​​

Users can share Datasets and Workflows with groups

Create users with different roles & permissions

LDAP Integration

Visualization​

View output of workflows as Linechart, Histogram, Barchart

View Random forests visually

Feature Generation​

Tokenization

TF-IDF, One Hot Encoder

String Indexer, Impute, Scaler

Developer Toolkit​

​​Add code using SQL, Scala, Jython nodes

Develop custom Nodes and have them available in Workflows

REST API's​

​​Access Sparkflows with a rich set of REST API's.

Workflows/Datasets/Dashboards/Execute Workflows/Access Result of Execution/Browse HDFS/Browse HIVE

Dashboards​

Assemble the output of various workflows and nodes into a Dashboard

Build Dashboards from Relational Sources, adding filtering & drill down capabilities

Workflow Scheduling

Schedule workflows to be run a various time of the day/week/month

Trigger workflows by events in a Kafka topic.

Streaming Analytics

Connect to Apache Kafka, Apache Flume, Sockets, Twitter

Perform Streaming Analytics

Load results into Apache HBase, Apache Solr, Elastic Search etc.

workflow-editor-mockup.png

The Sparkflows Experience

350+

bottom of page