BG-01.png

Processors Categories

Fire Insights has a number of processors OOTB which help you

quickly perform Analytics and  build Intelligent Applications

IO

Connectors

This node reads a table from Databricks

  • Execute Query In SnowFlake

Reads data from Apache Cassandra

  • Read Cassandra

  • Read Databricks Table

  • Read Elastic Search

This node reads a table from Databricks

Reads data from ElasticSearch

This node reads a table from Databricks

  • Read from Snowflake

Reads data from Apache HIVE table and creates a DataFrame from it

  • Read HIVE table

Node for reading Marketo files

  • Read Marketo

  • Read MangoDB

Reads data from MongoDB

This node reads data from Redshift

using JDBC.

  • Read Redshift-AWS

Read Structured

  • Create Datasets

  • Dataset Structured

  • Read Databricks Table

  • DB2 JDBC

  • Empty Dataset

  • JDBC Connection

  • JDBC Incremental Load

This node reads a table from Databricks

Reads the rows of the incoming DataFrame into Apache Cassandra

This node reads data from other databases using JDBC.

This node reads data from other databases using JDBC.

It creates an empty DataFrame

This node reads a table from Databricks

This node reads a table from Databricks

Connectors

Read Structured

IO

Connectors

  • Execute Query In SnowFlake

This node reads a table from Databricks

  • Read Cassandra

Reads data from Apache Cassandra

  • Read Databricks Table

This node reads a table from Databricks

  • Read Elastic Search

Reads data from ElasticSearch

  • Read from Snowflake

This node reads a table from Databricks

  • Read HIVE table

Reads data from Apache HIVE table and creates a DataFrame from it

  • Read Marketo

Node for reading Marketo files

  • Read MangoDB

Reads data from MongoDB

  • Read Redshift-AWS

This node reads data from Redshift using JDBC.

  • Salesforce

This node reads data from Salesforce.

  • Save Cassandra

Saves the rows of the incoming DataFrame into Apache Cassandra

  • Save Databricks table

This node saves a input data as table in Databricks

  • Save Elastic Search

Stores the rows of the incoming DataFrame into Elastic Search

  • Save Hbase

Saves all the rows in the incoming DataFrame onto Apache HBase using the specific field mapping

  • Save MangoDB

It Saves the incoming Dataframe into MongoDB

  • Save Redshift-AWS

This node save data to Redshift using JDBC.

  • SFTP

Secure file transfer protocol

  • Write to Snowflake

This node reads a table from Databricks

Read Structured

  • Create Datasets

Creates a dataset with the specified number of Rows and 9 pre-defined columns

  • Dataset Structured

Reads the rows of the incoming DataFrame into Apache Cassandra

  • DB2 JDBC

This node reads data from other databases using JDBC.

  • Empty Dataset

It creates an empty DataFrame

  • JDBC Connection

This node reads a table from Databricks

  • JDBC Incremental Load

This node reads a table from Databricks

  • Query JDBC Connection

This node executes query in Relational Databases using JDBC and creates a DataFrame from it

  • Read Avro

Dataset Node for reading Apache Avro files

  • Read CSV

It reads in CSV files and creates a DataFrame from it

  • Read Excel

Dataset Node for reading Excel files

  • Read Hana CSV

It reads in Hana CSV files and creates a DataFrame from it

  • Read JDBC

This node reads data from Relational Databases using JDBC and creates a DataFrame from it

  • Read JSON

Dataset Node for reading JSON files

  • Read Libsvm

It reads in Libsvm files and creates a DataFrame from it

  • Read Parquet

Dataset Node for reading Apache Parquet files

  • Read JDBC

This node writes data to databases using JDBC.

  • URL Text File Reader

Reads text file from the given URL and creates a DataFrame from it. Each line in the file is a record in the DataFrame.

Read Unstructured

  • Binary Files

Reads in Binary Files from a given path and loads them as FileName/Content

  • PDF

Reads in PDF Files from a given path and extracts the text content from them

  • PDF Image OCR

Reads in PDF Files from a given path, extracts the images from them and converts them to text with Tesseract

  • Text Files

Reads in Text Files from a given path and loads each line as a separate Row

  • Tika

Reads in files from a given path and parses them with Apache Tika

  • Whole Text Fields

Reads in Whole Text Files directory from a given path and loads each files as a separate Row with key(file name and values(file content)

Save

  • Insert Into HIVE table

Saves the DataFrame into an Apache HIVE Table

  • Kafka producer

Write out the Dataframe to a specified Apache Kafka Topic

  • Save as HIVE table

Saves the DataFrame into an Apache HIVE Table

  • Save CSV

Saves the DataFrame into the specified location in CSV Format

  • Save JDBC

This node writes data to databases using JDBC.

  • Save JSON

Saves the DataFrame into the specified location in JSON Format

  • Save ORC

Saves the DataFrame into the specified location in ORC Format

  • Save Parquet

Saves the DataFrame into the specified location in Parquet Format. When running on Hadoop, it is saved onto HDFS.

  • Upsert JDBC

Insert or update the data to databases using JDBC.

Name

Parse

Description

  • Apache Logs

Reads in Apache Log files from a given path, parses them and loads them into a DataFrame

  • Field Splitter

This node splits the string of the specified input column using the specified delimiter

  • Fixed Length Fields

Fixed Length

  • Multi Regex Extractor

This node to extract pattren from input columns

  • OCR

  • Regex Tokenizer

Performs Optical Character Recognition using the Tesseract Library

This node creates a new DataFrame by the process of taking text (such as a sentence) and breaking it into individual terms (usually words) based on regular express

Name

  • Date Time Field Extract

Prepare

Description

It creates a new DataFrame by extracting Date and Time fields.

  • Date Difference

  • Date To String

  • String To Date

  • String To Unix Time

  • Unix Time To String

This node finds difference between two dates

This node converts a date/time column to string with given format

This node converts a string column to date using the

given date/time format

This nodes converts a string to Unix Time

This node converts Unix Time to String

  • Imputing With Constant

  • Imputing With Mean Value

  • Imputing With Median

  • Imputing With Mode Value

  • Data Wrangling

  • Dedup

  • Drop Duplicate Rows

  • Drop Rows With Null

  • Remove Unwanted Characters

  • Remove Duplicate Rows

  • Math Functions Multiple

  • String Functions

  • String Functions Multiple

  • Text Case Transformer

  • Compare All Columns

  • Compare All Columns Single Output

  • Compare Specific Columns

  • Split By Expression

  • Split By Multiple Expressions

  • Assert

  • Concat Columns

It imputes missing value with constant value. It fills missing values (None) in selected columns with given constant value for the corresponding column, in the incoming DataFrame.

Imputing the continuous variables by mean.

Imputing with median

Imputing with most frequently observed value. It fills missing values (None) in selected columns with most frequently observed value in the corresponding column, in the incoming DataFrame.

This node creates a new DataFrame by applying each of the Rules specified

This node is used for problems like entity resolution or data matching. Entity resolution or Data matching is the problem of finding and linking different mentions of the same entity in a single data source or across multiple data sources.

This node finds and replaces text in a column containing string

This node finds and replaces text in a column containing string

This node removes unwanted characters

This node take an array of fields, compare rows on those fields. If they full match then its a match. From the matches it would randomly take one row and drop the rest.

Math Functions Multiple

This node performs specified String function on a row

String Functions Multiple

This node converts text to upper or lower case

Compares 2 incoming DataFrames

Compares 2 incoming DataFrames

Compares 2 incoming DataFrames on specific columns.

This node splits the incoming DataFrame into two output DataFrames by applying the conditional logic

Splits the incoming DataFrame into multiple output DataFrames by applying the conditional logic

This Node takes in an expression. It evaluates the expression and based on the results sends the execution to the first or the second output Node

It computes expressions to determine if the condition is met or not

Data Validation

Name

Description

  • Validate Address

  • Compare Datasets

  • Node Schema Validation

  • Validate Fields Simple

  • Validate Fields Advanced

This node validate the USA address

Validate the input datasets

This node to do the defined schema validation.

Validation Node

Validation Multiple Node

Feature Engineering

Name

Description

  • Date To Age

This node converts a date-column into columns of age (both in years and in days).

  • Moving Window Functions

This node calculates the moving values of selected functions for the field(input column).

  • Word Count

This node to do the total count of characters

Code

Name

Description

 Runs the given SQL query

Runs any given Jython code. The input dataframe is passed in the variable inDF. The output dataframe should be placed in the variable outDF

Runs any given Python code. It pipes the incoming DataFrame through pipe to the Python Script. Output back to Spark has to be written out using print.

Runs any given Python code. It pipes the incoming DataFrame through pipe to the Python Script. Output back to Spark has to be written out using print.

Runs the given SQL on the incoming DataFrame

Runs the given SQL on the incoming DataFrame

Runs any given Scala code for UDFs

Runs any given Scala code. The input dataframe is passed in the variable inDF. The output dataframe is passed back by registering it as a temporary table.

Calculates the moving values of selected functions for the field(input column).

  • SQL Executer

  • Jython

  • Pipe Python

  • Pipe Python2

  • PySpark

  • Run HIVEQL

  • Scala

  • Scala UDF

  • SQL

  • Unix Shell Commands

Executes shell command

Filter

Name

Description

  • Select Columns

This node creates a new DataFrame that contains only the selected columns

  • Drop Columns

This node creates a new DataFrame by deleting columns specified as an input

  • Filter By Date Range

This node filters Rows within the given date range

  • Filter By String Length

This node filters the Rows within the given string length. The column to be used for determining the string length is specified

  • Filter By Number Range

This node filter Rows in the given Number Range

  • Node Row Filter By Index

This node creates a new DataFrame containing only rows satisfying given condition

  • Row Filter

This node creates a new DataFrame containing only rows satisfying given conditionVIEW

JoinUnion

Name

Description

  • Geo Join

Joins the incoming dataframes

  • Join On Columns

Joins the incoming dataframes on a joinCol

  • Join On Common Column

Joins the incoming dataframes on a joinCol

Joins the incoming dataframes on 1 or more columns

  • Join On Common Columns

  • Join Using SQL

Registers the incoming DataFrames as temporary tables and executes the SQL provided

  • Union All

Creates a new DataFrame by merging all the rows without removing the duplicates

  • Union Distinct

Creates a new DataFrame by performing a DISTINCT on the result set, eliminating any duplicate rows

Data Profiling

Name

Description

It finds the distribution of Week Days from Date values

It finds the distribution of Years from Date values

Computes a histogram of the data using number of bins evenly spaced between the minimum and maximum of the specific columns.

Distribution of categorical data. Calculates the count of records for each unique value for the column specified.

Calculates the correlation between two series of data.

Categorical V.S. Categorical

Flag the outlier based on the selected column using Box-and-Whisker technique.

It Finds the distribution of months from Date values

Number of Null Values in Selected Columns.

  • Graph Week Day Distribution

  • Graph Year Distribution

  • HistoGram

  • Columns Cardinality

  • Correlation

  • Cross Tab

  • Flag Outlier

  • Graph Month Distribution

  • Null Values In Column

  • Summary Statistics

Summary statistics provide useful information about sample data.

Visualization

Name

Description

It finds the distribution of Week Days from Date values

It finds the distribution of Years from Date values

Computes a histogram of the data using number of bins evenly spaced between the minimum and maximum of the specific columns.

Distribution of categorical data. Calculates the count of records for each unique value for the column specified.

Calculates the correlation between two series of data.

Categorical V.S. Categorical

Flag the outlier based on the selected column using Box-and-Whisker technique.

It Finds the distribution of months from Date values

Number of Null Values in Selected Columns.

  • Graph Week Day Distribution

  • Graph Year Distribution

  • HistoGram

  • Columns Cardinality

  • Correlation

  • Cross Tab

  • Flag Outlier

  • Graph Month Distribution

  • Null Values In Column

  • Summary Statistics

Summary statistics provide useful information about sample data.

PRODUCT

RESOURCES

COMPANY

SOCIAL

  • LinkedIn
  • Twitter
© 2021 Sparkflows, Inc. All rights reserved.