top of page

DateTime


Name

Description

Date Time Field Extract

It creates a new DataFrame by extracting Date and Time fields.

Date To Age

This node converts a date-column into columns of age (both in years and in days).

Date Difference

This node finds difference between two dates.

Date To String

This node converts a date/time column to string with given format.

String To Date

This node converts string columns to date using the specified date/time format.

String To Unix Time

This node converts a string to Unix Time.

Time Functions

This node extracts year.

Unix Time To String

This node converts Unix Time to String.



Math


Name

Description

Math Expression

Creates new columns using the specified expressions.

Math Functions

Create new columns or replace the existing ones by using the specified function.


String


Name

Description

String Functions

String Functions Multiple.

Text Case Transformer

This node converts the text of the selected column to upper or lower case.


Parsing


Name

Description

Apache Logs

Reads in Apache Log files from a given path.

Field Splitter

This node splits the string of the specified input column using the specified delimiter.

Fixed Length Fields

Reads in files with fixed-length fields.

Multi Regex Extractor

This node extracts patterns from input columns.

OCR

Performs Optical Character Recognition using the Tesseract Library.

Paragraph Splitter


Parse JSON Col

Parses JSON content in a given column.

Regex Tokenizer

This node creates a new DataFrame by the process of taking text (such as a sentence) and breaking it into individual terms (usually words) based on regular expression.


Cleaning


Name

Description

Count rows columns


Data Wrangling

This node creates a new DataFrame by applying each of the Rules specified.

Data Cleansing

This node cleanses the selected columns from the dataset.

Dedup

This node is used for problems like entity resolution or data matching. Entity resolution or Data matching is the problem of finding and linking different mentions of the same entity in a single data source or across multiple data sources.

Drop Duplicate Rows

Drops duplicate rows from the incoming DataFrame. Specific columns can be selected to be used when comparing two rows.

Drop Rows With Null

This node creates a new DataFrame by dropping rows containing null values.

Drop Null Rows for Selected Columns

This node creates a new DataFrame by dropping rows containing null values for selected Columns.

Find And Replace Using Regex

This node finds and replaces text in a column with another.

Find And Replace Using Regex Advanced

This node finds and replaces text in a column containing a string..

Impute Advanced

It imputes missing or given value with constant value.

Imputing With Constant

It imputes missing value with constant value. It fills missing values (None) in selected columns with given constant value for the corresponding column.

Imputing With Mean Value

Imputing the continuous variables by mean.

Imputing With Median

Imputing with median.

Imputing With Mode Value

Imputing with most frequently observed value. It fills missing values (None) in selected columns with the most frequently observed value in the corresponding column.

Count Null Values

Counts null value in columns using the specified input.

Remove Duplicate Rows

This node takes an array of fields and compares the rows on those fields. From the matches, it would randomly take one row and drop the rest.

Remove Unwanted Characters

This node removes unwanted characters from the specified input columns.

Remove Unwanted Characters Advanced

This node removes unwanted characters.

Standard Deviation

Creates new columns using the specified input columns.

Value count

Counts value in columns using the specified input.


Control Structures


Name

Description

Execute In Loop


Execute Workflow

Fires the given workflow. Does not wait for the workflow to complete to resume execution.

Read Parameters

Reads in the parameters from the given file.

Specify Parameters

Provides additional parameters to the workflow. When running with spark-submit.


Add Columns


Name

Description

Add  Columns

This node allows adding new columns with certain values.

Add Column Advanced

This node allows adding new columns with certain values.

Case When

This node creates a new DataFrame with a new column appended to it containing value based on the condition met.

Case When Advanced

This node creates a new DataFrame with a new column appended to it containing value based on the condition met.

Concat Columns

This node creates a new DataFrame by concatenating the specified columns of the input DataFrame.

Expressions

This node creates a new DataFrame by adding new columns to the incoming DataFrame as per the Expression computation.

Generate UID

This node Generates a new column with unique Index/Value for each row in the Dataset for each partition. Each Partition starts a new range.

Generate UUID

This node Generates a Universally Unique ID.

Hash

This node adds a new Columns which contains the Hash of the specified columns.

Row Numbering


Zip With Index

This node Generates a new column with unique Index/Value for each row in the Dataset.


Split


Name

Description

Compare All Columns

Compares 2 incoming DataFrames. Outputs 3 DataFrames (A-B).

Compare All Columns Single Output

Compares 2 incoming DataFrames. Outputs 1 DataFrame (A-B) or (B-A) or (A intersection B) based on user's input.

Compare Specific Columns

Compares 2 incoming DataFrames on specific columns. Outputs 3 DataFrames (A-B).

Compare Specific Columns Single Output

Compares 2 incoming DataFrames on specific columns. Outputs 1 DataFrame (A-B) or (B-A) or (A intersection B) based on user's input.

Split By Expression

This node splits the incoming DataFrame into two output DataFrames by applying the conditional logic.

Split By Multiple Expressions

Splits the incoming DataFrame into multiple output DataFrames by applying the conditional logic.


Condition


Name

Description

Assert

This Node takes in an expression. It evaluates the expression and based on the results sends the execution to the first or the second output Node.

Decision

It computes expressions to determine if the condition is met or not. Accordingly proceeds to the next step or stops here.


Cast-Data Type


Name

Description

Cast To Single Type

This node creates a new DataFrame by casting the specified input columns to a new data type.

Cast To Different Types-1

This node creates a new DataFrame by casting the specified columns into new types.

Cast To Different Types-2

This node creates a new DataFrame by casting the specified columns into new types.


Filter


Name

Description

Select Columns

This node creates a new DataFrame that contains only the selected columns.

Drop Columns

This node creates a new DataFrame by dropping the specified columns.

Filter Advanced

This node generates two new DataFrames: one containing rows that meet the specified condition at the lower edge.

Filter By Date Range

This node filters Rows within the given date range.

Filter By String Length

This node filters the Rows within the given string length. The column to be used for determining the string length is specified.

Filter By Number Range

This node filters the rows in the given Number Range.

Row Filter

This node creates a new DataFrame containing the rows that satisfy the given condition.

Node Row Filter By Index

This node creates a new DataFrame containing only rows satisfying given condition.

Select


Filter Unique

This node splits the incoming DataFrame into two output DataFrames one having unique values and other having rest of duplicates.


Group

Name

Description

Cube

Cube Node generates a result set that shows aggregates for all combinations of values in the selected columns.

Group By

Group By Node

Melt

Melt Node

Pivot By

Pivot Node

Pivot By

Pivot Node

Rollup

Rollup Node generates a result set that shows aggregates for a hierarchy of values in the selected columns.


Join-Union

Name

Description

Geo Join

This node joins the incoming DataFrames.

Join On Columns

Joins the incoming DataFrames on the given columns.

Join On Common Column

This node joins the incoming DataFrames using one common column between them.

Join On Common Columns

This node joins the incoming DataFrames on 1 or more columns.

Join Using SQL

This node registers the incoming DataFrames as temporary tables and executes the SQL provided.

Union All

This node creates a new DataFrame by doing a union of all the rows in the incoming DataFrames. It do not remove any duplicates.

Union Distinct

This node creates a new DataFrame by performing a UNION of all the rows in the incoming Dataframe. It then performs DISTINCT on the result set.


Code

Name

Description

Jython

This node runs any given Jython code. The input DataFrame is passed in the variable inDF. The output DataFrame should be placed in the variable outDF.

Multi Input To Multi Output PySpark

This node runs any given PySpark code. The input DataFrame is passed in the variable in DFs. The output array of DataFrames is passed back by registering each DataFrames as a temporary table.

MultiInputPySpark

This node runs any given PySpark code. The input DataFrame is passed in the variable in DFs. The output DataFrames is passed back by registering it as a temporary table.

Pipe Python

This node runs any given Python code. It pipes the incoming DataFrame through pipe to the Python Script. Output back to Spark has to be written out using print.

Pipe Python2

This node runs any given Python code. It pipes the incoming DataFrame through pipe to the Python Script. Output back to Spark has to be written out using print.

PySpark

This node runs any given PySpark code. The input DataFrame is passed into the function myfn as a parameter.

Run HIVEQL

This node runs the given SQL on the incoming DataFrame.

Run Python Code

This node executes the given python code.

Run Python File

This node executes the given python file..

Spark

This node runs any given Scala code. The input DataFrame is passed in the variable inDF. The output DataFrame is passed back by registering it as a temporary table.

Scala UDF

This node runs any given Scala code for UDFs.

SQL

This node runs the given SQL on the incoming DataFrame.

SQL Executer

This node runs the given SQL query.

Unix Shell Commands

This node executes shell command.


Delta-Capture

Name

Description

Delta Merge


Delta Vacuum



bottom of page