Feature Selection
Name | Description |
Feature Selection With Importance | |
Feature Selection With Correlation |
SparkML Feature Scaler
Name | Description |
Min Max Scaler | MinMaxScaler transforms a dataset of Vector rows. |
Min Max Scaler Transform | MinMaxScaler transforms a dataset of Vector rows. |
Min Max Scaler Transform | MinMaxScaler transforms a dataset of Vector rows. |
Min Max Scaler | MinMaxScaler transforms a dataset of Vector rows. |
Standard Scaler | StandardScaler transforms a dataset of Vector rows. |
Standard Scaler Transform | StandardScaler transforms a dataset of Vector rows. |
SparkML Feature Extraction
Name | Description |
Count Vectorizer | Extracts the vocabulary from a given collection of documents and generates a vector of token counts for each document. |
Hashing TF | Maps a sequence of terms to term frequencies using the hashing trick. |
Markov Chain | |
R Formula | RFormula feature selection. |
Word2 Vec | Transforms vectors of words into vectors of numeric codes for the purpose of further processing by NLP or machine learning algorithms. |
SparkML Feature Transformers
Name | Description |
Binarizer | Binarize a column of continuous features given a threshold. |
Bucketizer | The Bucketizer transformer in PySpark is used to discretize continuous features into categorical ones by creating a fixed number of buckets. |
Bucketizer Transform | Bucketizer Transform. |
IDF | Compute the Inverse Document Frequency (IDF) given a collection of documents. |
Imputer | Imputation estimator for completing missing values. |
Imputer | Imputation estimator for completing missing values. |
Imputer Transform | Imputation estimator for completing missing values. |
Imputer Transform | Imputation estimator for completing missing values. |
Index String | Maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column. |
Index To String | Maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column. |
Index To String Transform | Maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column. |
Interaction | This transformer takes in Double and Vector type columns and outputs a flattened vector of their feature interactions. |
Interaction Transform | This transformer takes in Double and Vector type columns and outputs a flattened vector of their feature interactions. |
MaxAbs Scaler | Rescale each feature individually to range [-1 |
MaxAbs Scaler | Rescale each feature individually to range [-1 |
MaxAbs Scaler Transform | Rescale each feature individually to range [-1 |
MaxAbs Scaler Transform | Rescale each feature individually to range [-1 |
N Gram Transformer | Converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.When the input is empty. |
Normalizer | Normalizer is a Transformer which transforms a dataset of Vector rows. |
Normalizer | Normalizer is a Transformer which transforms a dataset of Vector rows. |
Normalizer Transform | Normalizer is a Transformer which transforms a dataset of Vector rows. |
One Hot Encoder | Maps a column of label indices to a column of binary vectors. |
One Hot Encoder Advanced | Maps a column of label indices to a column of binary vectors. |
One Hot Encoder Advanced Transform | Maps a column of label indices to a column of binary vectors. |
One Hot Encoder | Maps a column of label indices to a column of binary vectors. |
One Hot Encoder Transform | Maps a column of label indices to a column of binary vectors. |
Polynominal Expansion | Perform feature expansion in a polynomial space. |
Quantile Discretizer | QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features. |
Quantile Discretizer Transform | QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features. |
Robust Scaler | RobustScaler removes the median and scales the data according to the quantile range. |
Robust Scaler | RobustScaler removes the median and scales the data according to the quantile range. |
Robust Scaler Transform | RobustScaler removes the median and scales the data according to the quantile range. |
Robust Scaler Transform | RobustScaler removes the median and scales the data according to the quantile range. |
Signal Processing | Expects a signal as column input and performs transformations. |
SMOTE | Implementation of SMOTE - Synthetic Minority Over-sampling Technique. |
SQL Transformer | This node runs the given SQL on the incoming DataFrame using Spark ML SQLTransformer. |
Stop Words Remover | Filters out stop words from input. Null values from input array are preserved unless adding null to stopWords explicitly. |
String Indexer | StringIndexer encodes a string column of labels to a column of label indices. |
String Indexer Advanced Transform | StringIndexer encodes a string column of labels to a column of label indices. |
String Indexer Advanced | StringIndexer encodes a string column of labels to a column of label indices. |
String Indexer | StringIndexer encodes a string column of labels to a column of label indices. |
String Indexer Transform | StringIndexer encodes a string column of labels to a column of label indices. |
Tokenizer | A tokenizer that converts the input string to lowercase and then splits it by white spaces. |
Vector Assembler | Merges multiple columns into a vector column. |
Vector Functions | Vector Functions for transforming Vectors. |
Vector Indexer | Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature. |
Vector Indexer | Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature. |
Vector Indexer Transform | Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature. |
Vector Indexer Transform | Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature. |
Word To Score Mapping | Map the original word of hashValue to score. |
SparkML Dimensionality Reduction
Name | Description |
PCA | Trains a model to project vectors to a low-dimensional space using PCA. |
SVD |
SparkML Split Dataset
Name | Description |
Split | This node splits the incoming DataFrame into 2. It takes in the fraction to use in splitting the data. |
Split Probability Column | |
Split With Stratified Sampling | This node splits the incoming DataFrame into 2. It takes in the fraction to use in splitting the data by Stratified Sampling. |
SparkML Feature Selection
Name | Description |
ChiSq Selector | Chi-Squared feature selection. |
Vector Slicer | VectorSlicer feature selection. |
SparkML Clustering
Name | Description |
Gaussian Mixture | This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each's contribution to the composite. |
K-Means | K-means clustering with support for k-means initialization proposed by Bahmani et al. |
LDA | LDA is given a collection of documents as input data. |
SparkML Regression
Name | Description |
AFT Survival Regression | Accelerated failure time (AFT) model which is a parametric survival regression model for censored data. |
Decision Tree Regression | It supports both continuous and categorical features. |
GBT Regression | It supports both continuous and categorical features. |
Linear Regression | The interface for working with linear regression models and model summaries is similar to the logistic regression case. |
Random Forest Regression | It supports both continuous and categorical features. |
XGBoost Regressor |
SparkML Classification
Name | Description |
Decision Tree Classifier | It supports both binary and multiclass labels. |
GBT Classifier | Gradient-Boosted Trees (GBTs) is a learning algorithm for classification. It supports binary labels. |
Logistic Regression | Logistic regression. |
MultiLayer Perceptron | It supports creation of full connected neural network. |
Naive Bayes | Creates a NaiveBayes model. Supports both Multinomial NB which can handle finitely supported discrete data. |
Random Forest Classifier | Supports both binary and multiclass labels. |
XGBoost Classifier |
SparkML Collaborative Filtering
Name | Description |
ALS | Alternating Least Squares (ALS) matrix factorization. |
SparkML Modeling
Name | Description |
Binary Classification Evaluator | Evaluator for binary classification. |
Clustering Evaluator | Evaluator for Clustering. |
Cross Validator | This node represents Cross Validator from Spark ML. |
Load MLeap | |
Spark ML Model Load | |
Spark ML Model Save | This node saves the ML model generated at the specified path. |
Multiclass Classification Evaluator | Evaluator for multiclass classification. |
Spark Pipeline | This node represents Pipeline from Spark ML. |
Spark Predict | Predict node takes in a DataFrame and Model and makes predictions. |
Regression Evaluator | Evaluator for regression. |
Spark ML ROC | It produces the ROC curve based on the probability and label. |
Save MLeap | |
Train Validation Split | This node represents Train Validation Split from Spark ML. |
SparkML FreqPattern Mining
Name | Description |
FP Growth | Does Pattern Mining using FPGrowth Algorithm |
H2O
Name | Description |
Extract Probabilities | |
H2O Auto ML | H2O AutoML. |
H2O Clustering Evaluator | Evaluator for Clustering. |
H2O Distributed Random Forest | Distributed Random Forest (DRF) is a powerful classification and regression tool. DRF generates a forest of classification or regression trees. |
H2O Distributed Random Forest | Distributed Random Forest (DRF) is a powerful classification and regression tool. DRF generates a forest of classification or regression trees. |
H2O Gradient Boosting Machine | Gradient Boosting Machine (for Regression and Classification) is a forward learning ensemble method. |
H2O Generalized Linear Models | Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions. |
H2O Generalized Low Rank Models | Generalized Low Rank Models (GLRM) is an algorithm for dimensionality reduction of a dataset. |
H2O Isolation Forest | Isolation Forest is similar in principle to Random Forest and is built on the basis of decision trees. |
H2O K-Means | K-Means falls in the general category of clustering algorithms. |
H2O ML Model Load | Loads an H2O MOJO ML model. |
H2O ML Model Save | Saves an H2O MOJO ML model at the specified path. |
H2O Neural Network | H2O Deep Learning is based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back-propagation. |
H2O PCA | PCA is commonly used to model without regularization or perform dimensionality reduction. It can also be useful to carry out as a preprocessing step before distance-based algorithms such as K-Means since PCA guarantees that all dimensions of a manifold are orthogonal. |
H2O Score | Scores the data using the H2O model. |
H2O Word to Vec | The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output. |
H2O XGBoost | XGBoost is a supervised learning algorithm that implements a process called boosting to yield accurate models. |
SkLearn Classification
Name | Description |
Sklearn Gradient Boosting Classifier | Gradient Boosting Classifier. |
Sklearn Logistic Regression | Logistic Regression is a linear model for classification and implementation can fit binary. |
Sklearn Random Forest Classifier | Random Forest Classifier. |
Sklearn Modeling
Name | Description |
Custom Metrics | Custom Metrics to check on aggregated field. |
Sklearn Classification Evaluator | Evaluator for classification. |
Sklearn Model Load From S3 | Load the Sklearn model stored in the pickel format in S3. |
Sklearn Model Load | Load the Sklearn model stored in the pickel file. |
Sklearn Predict | Predict node takes in a dataframe and model and makes predictions. |
Sklearn Regression Evaluator | Evaluator for regression. |
Sklearn Model Save To S3 | Saves the Sklearn model generated at the specified path in S3 in pickle format. |
Sklearn Model Save | Saves the Sklearn model generated at the specified path in pickle file. |
Sklearn Pre-Processing
Name | Description |
Sklearn Binarizer | Binarize data (set feature values to 0 or 1) according to a threshold. |
Sklearn Binarizer Transform | Binarize data (set feature values to 0 or 1) according to a threshold. |
Sklearn Label Encoder | Encode labels with value between 0 and n_classes-1. |
MinMax Scaler Inverse Transform | The inverse transform node is used to transform the scaled data back to its original form. |
Sklearn MinMaxScaler | Transforms features by scaling each feature to a given range. |
Sklearn MinMax Scaler Transform | Transforms DataFrame. |
Sklearn Normalizer | Normalize samples individually to unit norm. |
Sklearn Normalizer Transform | Normalize samples individually to unit norm. |
Sklearn OneHotEncoder | Encode categorical integer features as a one-hot numeric array. |
Sklearn Quantile Fit Transform | Transform features using quantiles information. |
Sklearn Quantile Transform | Transform features using quantiles information. |
Standard Scaler Inverse Transform | The inverse transform node is used to transform the scaled data back to its original form. |
Sklearn StandardScalar | Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1. |
Sklearn StandardScalar Transform | Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1. |
Deep Learning
Name | Description |
Dense Layer | |
Keras Model Compile | |
Keras Model Fit | |
Keras Model Sequential | |
Keras Predict | |
Keras Preprocessor |
Sklearn Data
Name | Description |
Sklearn Polynomial | Polynomial regression is a special case of linear regression. |
Sklearn Optimization
Name | Description |
Optimization | |
Optimization Model Load And Score |
Generative-AI
Name | Description |
Create Faiss Embeddings | Creates Vector Embeddings. |
Natural Language Query | Query database with natural language. |
Hugging Face Custom Category Sentiment Analysis | Sentiment Analysis with custom categories using models hosted in Hugging Face repository. |
Hugging Face Grammatical Correctness | Grammatical Correctness using models hosted in Hugging Face repository. |
Hugging Face Natural Language Inference | Natural Language Inference using models hosted in Hugging Face repository. |
Hugging Face Question Natural Language Inference | Question Natural Language Inference using models hosted in Hugging Face repository. |
Hugging Face Sentiment Analysis | Sentiment Analysis using models hosted in Hugging Face repository. |
Hugging Face Summarization | Summarization using models hosted in Hugging Face repository. |
Hugging Face Tone Analysis | Tone Analysis using models hosted in Hugging Face repository. |
Summarize PDF | Summarises Pdf Documents. |
Query Document | Query from large Documents. |
Translate PDF | Translates the pdf files from the input directory and outputs corresponding translated txt files in the output directory. |
Web Scraper | Scrapes Webpages. |
Sklearn Regression
Name | Description |
Sklearn Bayesian Ridge Regression | Bayesian regression allows a natural mechanism to survive insufficient data or poorly distributed data by formulating linear regression using probability distributors rather than point estimates. The output or response â€˜yâ€™ is assumed to drawn from a probability distribution rather than estimated as a single value. |
Sklearn Gradient Boosting Regression | Gradient Boosting Regression. |
SkLearn Lasso Regression | In Lasso Regression. |
Sklearn Random Forest Regression | Random Forest Regression. |
Sklearn Ridge Regression | Ridge Regression. |
PyCaret
Name | Description |
PyCaret AutoML Classification | |
PyCaret AutoML Regression |
TimeSeries
Name | Description |
Arima | AutoARIMA. |
Arima Forecast | Forecast by calling the forecast() or the predict() functions on the Arima object returned from calling fit. |
Arima Model Load | This node load the Arima model stored in the pickle file. |
Arima Model Save | This node saves the Arima model generated at the specified path in pickle file. |
LSTM | |
Prophet | |
Prophet Cross Validator | |
Prophet Model Load | This node load the Prophet model stored in the pickel file. |
Prophet Make Future Dataframe | |
Prophet Predict | |
Prophet Model Save | This node saves the Prophet model generated at the specified path in pickle file. |
Sarimax | Seasonal Autoregressive Integrated Moving Average. |
Sarimax Forecast | Forecast by calling the forecast() or the predict() functions on the SARIMAXResults object returned from calling fit. |
Sarimax Model Load | This node load the Sarimax model stored in the pickle file. |
Sarimax Model Save | This node saves the Sarimax model generated at the specified path in pickle file. |
TS Decompose | |
VAR | |
VarForecast | |
VAR Model Load | This node load the VAR model stored in the pickle file. |
VAR Model Save | This node saves the VAR model generated at the specified path in pickle file. |
ScoreCard
Name | Description |
Binning Scorecard | |
Variable Selection Scorecard |
OpenNLP
Name | Description |
Open NLP Document Categorizer | This node classifies text into pre-defined categories using OpenNLP. It takes in the OpenNLP model. |
Open NLP Name Finder | This node finds names using OpenNLP. It takes in the OpenNLP model. |
Open NLP Sentence Detector | This node detects sentences using OpenNLP. It takes in the OpenNLP model. |