top of page

Feature Selection


Name

Description

Feature Selection With Importance


Feature Selection With Correlation



SparkML Feature Scaler


Name

Description

Min Max Scaler

MinMaxScaler transforms a dataset of Vector rows.

Min Max Scaler Transform

MinMaxScaler transforms a dataset of Vector rows.

Min Max Scaler Transform

MinMaxScaler transforms a dataset of Vector rows.

Min Max Scaler

MinMaxScaler transforms a dataset of Vector rows.

Standard Scaler

StandardScaler transforms a dataset of Vector rows.

Standard Scaler Transform

StandardScaler transforms a dataset of Vector rows.


SparkML Feature Extraction


Name

Description

Count Vectorizer

Extracts the vocabulary from a given collection of documents and generates a vector of token counts for each document.

Hashing TF

Maps a sequence of terms to term frequencies using the hashing trick.

Markov Chain


R Formula

RFormula feature selection.

Word2 Vec

Transforms vectors of words into vectors of numeric codes for the purpose of further processing by NLP or machine learning algorithms.


SparkML Feature Transformers


Name

Description

Binarizer

Binarize a column of continuous features given a threshold.

Bucketizer

The Bucketizer transformer in PySpark is used to discretize continuous features into categorical ones by creating a fixed number of buckets.

Bucketizer Transform

Bucketizer Transform.

IDF

Compute the Inverse Document Frequency (IDF) given a collection of documents.

Imputer

Imputation estimator for completing missing values.

Imputer

Imputation estimator for completing missing values.

Imputer Transform

Imputation estimator for completing missing values.

Imputer Transform

Imputation estimator for completing missing values.

Index String

Maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column.

Index To String

Maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column.

Index To String Transform

Maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column.

Interaction

This transformer takes in Double and Vector type columns and outputs a flattened vector of their feature interactions.

Interaction Transform

This transformer takes in Double and Vector type columns and outputs a flattened vector of their feature interactions.

MaxAbs Scaler

Rescale each feature individually to range [-1

MaxAbs Scaler

Rescale each feature individually to range [-1

MaxAbs Scaler Transform

Rescale each feature individually to range [-1

MaxAbs Scaler Transform

Rescale each feature individually to range [-1

N Gram Transformer

Converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.When the input is empty.

Normalizer

Normalizer is a Transformer which transforms a dataset of Vector rows.

Normalizer

Normalizer is a Transformer which transforms a dataset of Vector rows.

Normalizer Transform

Normalizer is a Transformer which transforms a dataset of Vector rows.

One Hot Encoder

Maps a column of label indices to a column of binary vectors.

One Hot Encoder Advanced

Maps a column of label indices to a column of binary vectors.

One Hot Encoder Advanced Transform

Maps a column of label indices to a column of binary vectors.

One Hot Encoder

Maps a column of label indices to a column of binary vectors.

One Hot Encoder Transform

Maps a column of label indices to a column of binary vectors.

Polynominal Expansion

Perform feature expansion in a polynomial space.

Quantile Discretizer

QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features.

Quantile Discretizer Transform

QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features.

Robust Scaler

RobustScaler removes the median and scales the data according to the quantile range.

Robust Scaler

RobustScaler removes the median and scales the data according to the quantile range.

Robust Scaler Transform

RobustScaler removes the median and scales the data according to the quantile range.

Robust Scaler Transform

RobustScaler removes the median and scales the data according to the quantile range.

Signal Processing

Expects a signal as column input and performs transformations.

SMOTE

Implementation of SMOTE - Synthetic Minority Over-sampling Technique.

SQL Transformer

This node runs the given SQL on the incoming DataFrame using Spark ML SQLTransformer.

Stop Words Remover

Filters out stop words from input. Null values from input array are preserved unless adding null to stopWords explicitly.

String Indexer

StringIndexer encodes a string column of labels to a column of label indices.

String Indexer Advanced Transform

StringIndexer encodes a string column of labels to a column of label indices.

String Indexer Advanced

StringIndexer encodes a string column of labels to a column of label indices.

String Indexer

StringIndexer encodes a string column of labels to a column of label indices.

String Indexer Transform

StringIndexer encodes a string column of labels to a column of label indices.

Tokenizer

A tokenizer that converts the input string to lowercase and then splits it by white spaces.

Vector Assembler

Merges multiple columns into a vector column.

Vector Functions

Vector Functions for transforming Vectors.

Vector Indexer

Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature.

Vector Indexer

Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature.

Vector Indexer Transform

Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature.

Vector Indexer Transform

Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature.

Word To Score Mapping

Map the original word of hashValue to score.


SparkML Dimensionality Reduction


Name

Description

PCA

Trains a model to project vectors to a low-dimensional space using PCA.

SVD



SparkML Split Dataset


Name

Description

Split

This node splits the incoming DataFrame into 2. It takes in the fraction to use in splitting the data.

Split Probability Column


Split With Stratified Sampling

This node splits the incoming DataFrame into 2. It takes in the fraction to use in splitting the data by Stratified Sampling.


SparkML Feature Selection


Name

Description

ChiSq Selector

Chi-Squared feature selection.

Vector Slicer

VectorSlicer feature selection.


SparkML Clustering


Name

Description

Gaussian Mixture

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each's contribution to the composite.

K-Means

K-means clustering with support for k-means initialization proposed by Bahmani et al.

LDA

LDA is given a collection of documents as input data.


SparkML Regression


Name

Description

AFT Survival Regression

Accelerated failure time (AFT) model which is a parametric survival regression model for censored data.

Decision Tree Regression

It supports both continuous and categorical features.

GBT Regression

It supports both continuous and categorical features.

Linear Regression

The interface for working with linear regression models and model summaries is similar to the logistic regression case.

Random Forest Regression

It supports both continuous and categorical features.

XGBoost Regressor



SparkML Classification


Name

Description

Decision Tree Classifier

It supports both binary and multiclass labels.

GBT Classifier

Gradient-Boosted Trees (GBTs) is a learning algorithm for classification. It supports binary labels.

Logistic Regression

Logistic regression.

MultiLayer Perceptron

It supports creation of full connected neural network.

Naive Bayes

Creates a NaiveBayes model. Supports both Multinomial NB which can handle finitely supported discrete data.

Random Forest Classifier

Supports both binary and multiclass labels.

XGBoost Classifier



SparkML Collaborative Filtering


Name

Description

ALS

Alternating Least Squares (ALS) matrix factorization.


SparkML Modeling


Name

Description

Binary Classification Evaluator

Evaluator for binary classification.

Clustering Evaluator

Evaluator for Clustering.

Cross Validator

This node represents Cross Validator from Spark ML.

Load MLeap


Spark ML Model Load


Spark ML Model Save

This node saves the ML model generated at the specified path.

Multiclass Classification Evaluator

Evaluator for multiclass classification.

Spark Pipeline

This node represents Pipeline from Spark ML.

Spark Predict

Predict node takes in a DataFrame and Model and makes predictions.

Regression Evaluator

Evaluator for regression.

Spark ML ROC

It produces the ROC curve based on the probability and label.

Save MLeap


Train Validation Split

This node represents Train Validation Split from Spark ML.


SparkML FreqPattern Mining


Name

Description

FP Growth

Does Pattern Mining using FPGrowth Algorithm


H2O


Name

Description

Extract Probabilities


H2O Auto ML

H2O AutoML.

 H2O Clustering Evaluator

Evaluator for Clustering.

H2O Distributed Random Forest

Distributed Random Forest (DRF) is a powerful classification and regression tool. DRF generates a forest of classification or regression trees.

H2O Distributed Random Forest

Distributed Random Forest (DRF) is a powerful classification and regression tool. DRF generates a forest of classification or regression trees.

H2O Gradient Boosting Machine

Gradient Boosting Machine (for Regression and Classification) is a forward learning ensemble method.

H2O Generalized Linear Models

Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions.

H2O Generalized Low Rank Models

Generalized Low Rank Models (GLRM) is an algorithm for dimensionality reduction of a dataset.

H2O Isolation Forest

Isolation Forest is similar in principle to Random Forest and is built on the basis of decision trees.

H2O K-Means

K-Means falls in the general category of clustering algorithms.

H2O ML Model Load

Loads an H2O MOJO ML model.

H2O ML Model Save

Saves an H2O MOJO ML model at the specified path.

H2O Neural Network

H2O Deep Learning is based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back-propagation.

H2O PCA

PCA is commonly used to model without regularization or perform dimensionality reduction. It can also be useful to carry out as a preprocessing step before distance-based algorithms such as K-Means since PCA guarantees that all dimensions of a manifold are orthogonal.

H2O Score

Scores the data using the H2O model.

H2O Word to Vec

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output.

H2O XGBoost

XGBoost is a supervised learning algorithm that implements a process called boosting to yield accurate models.


SkLearn Classification


Name

Description

Sklearn Gradient Boosting Classifier

Gradient Boosting Classifier.

Sklearn Logistic Regression

Logistic Regression is a linear model for classification and implementation can fit binary.

Sklearn Random Forest Classifier

Random Forest Classifier.


Sklearn Modeling


Name

Description

Custom Metrics

Custom Metrics to check on aggregated field.

Sklearn Classification Evaluator

Evaluator for classification.

Sklearn Model Load From S3

Load the Sklearn model stored in the pickel format in S3.

Sklearn Model Load

Load the Sklearn model stored in the pickel file.

Sklearn Predict

Predict node takes in a dataframe and model and makes predictions.

Sklearn Regression Evaluator

Evaluator for regression.

Sklearn Model Save To S3

Saves the Sklearn model generated at the specified path in S3 in pickle format.

Sklearn Model Save

Saves the Sklearn model generated at the specified path in pickle file.


Sklearn Pre-Processing


Name

Description

Sklearn Binarizer

Binarize data (set feature values to 0 or 1) according to a threshold.

Sklearn Binarizer Transform

Binarize data (set feature values to 0 or 1) according to a threshold.

Sklearn Label Encoder

Encode labels with value between 0 and n_classes-1.

MinMax Scaler Inverse Transform

The inverse transform node is used to transform the scaled data back to its original form.

Sklearn MinMaxScaler

Transforms features by scaling each feature to a given range.

Sklearn MinMax Scaler Transform

Transforms DataFrame.

Sklearn Normalizer

Normalize samples individually to unit norm.

Sklearn Normalizer Transform

Normalize samples individually to unit norm.

Sklearn OneHotEncoder

Encode categorical integer features as a one-hot numeric array.

Sklearn Quantile Fit Transform

Transform features using quantiles information.

Sklearn Quantile Transform

Transform features using quantiles information.

Standard Scaler Inverse Transform

The inverse transform node is used to transform the scaled data back to its original form.

Sklearn StandardScalar

Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1.

Sklearn StandardScalar Transform

Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1.


Deep Learning


Name

Description

Dense Layer


Keras Model Compile


Keras Model Fit


Keras Model Sequential


Keras Predict


Keras Preprocessor



Sklearn Data


Name

Description

Sklearn Polynomial

Polynomial regression is a special case of linear regression.


Sklearn Optimization


Name

Description

Optimization


Optimization Model Load And Score



Generative-AI


Name

Description

Create Faiss Embeddings

Creates Vector Embeddings.

Natural Language Query

Query database with natural language.

Hugging Face Custom Category Sentiment Analysis

Sentiment Analysis with custom categories using models hosted in Hugging Face repository.

Hugging Face Grammatical Correctness

Grammatical Correctness using models hosted in Hugging Face repository.

Hugging Face Natural Language Inference

Natural Language Inference using models hosted in Hugging Face repository.

Hugging Face Question Natural Language Inference

Question Natural Language Inference using models hosted in Hugging Face repository.

Hugging Face Sentiment Analysis

Sentiment Analysis using models hosted in Hugging Face repository.

Hugging Face Summarization

Summarization using models hosted in Hugging Face repository.

Hugging Face Tone Analysis

Tone Analysis using models hosted in Hugging Face repository.

Summarize PDF

Summarises Pdf Documents.

Query Document

Query from large Documents.

Translate PDF

Translates the pdf files from the input directory and outputs corresponding translated txt files in the output directory.

Web Scraper

Scrapes Webpages.


Sklearn Regression


Name

Description

Sklearn Bayesian Ridge Regression

Bayesian regression allows a natural mechanism to survive insufficient data or poorly distributed data by formulating linear regression using probability distributors rather than point estimates. The output or response ‘y’ is assumed to drawn from a probability distribution rather than estimated as a single value.

Sklearn Gradient Boosting Regression

Gradient Boosting Regression.

SkLearn Lasso Regression

In Lasso Regression.

Sklearn Random Forest Regression

Random Forest Regression.

Sklearn Ridge Regression

Ridge Regression.


PyCaret


Name

Description

PyCaret AutoML Classification


PyCaret AutoML Regression



TimeSeries


Name

Description

Arima

AutoARIMA.

Arima Forecast

Forecast by calling the forecast() or the predict() functions on the Arima object returned from calling fit.

Arima Model Load

This node load the Arima model stored in the pickle file.

Arima Model Save

This node saves the Arima model generated at the specified path in pickle file.

LSTM


Prophet


Prophet Cross Validator


Prophet Model Load

This node load the Prophet model stored in the pickel file.

Prophet Make Future Dataframe


Prophet Predict


Prophet Model Save

This node saves the Prophet model generated at the specified path in pickle file.

Sarimax

Seasonal Autoregressive Integrated Moving Average.

Sarimax Forecast

Forecast by calling the forecast() or the predict() functions on the SARIMAXResults object returned from calling fit.

Sarimax Model Load

This node load the Sarimax model stored in the pickle file.

Sarimax Model Save

This node saves the Sarimax model generated at the specified path in pickle file.

TS Decompose


VAR


VarForecast


VAR Model Load

This node load the VAR model stored in the pickle file.

VAR Model Save

This node saves the VAR model generated at the specified path in pickle file.


ScoreCard


Name

Description

Binning Scorecard


Variable Selection Scorecard



OpenNLP


Name

Description

Open NLP Document Categorizer

This node classifies text into pre-defined categories using OpenNLP. It takes in the OpenNLP model.

Open NLP Name Finder

This node finds names using OpenNLP. It takes in the OpenNLP model.

Open NLP Sentence Detector

This node detects sentences using OpenNLP. It takes in the OpenNLP model.


bottom of page