top of page

Workflow Automation Templates

A library of ready-to-use workflow templates to accelerate your data journey

Over Sampling - SMOTE

Balance datasets using synthetic sampling

Data-cleaning.jpg
Overview

This workflow implements SMOTE (Synthetic Minority Over-sampling Technique) to resolve class imbalance by generating synthetic samples for underrepresented classes. It enhances model fairness and performance, especially for classification tasks.

Details

The process begins with loading the Imbalanced dataset. The Vector Assembler combines feature columns into a single vector suitable for modeling. The SMOTE node then analyzes the label field and creates synthetic minority class examples by interpolating between existing instances.

After balancing, the Drop Rows With Null node ensures data consistency, and Print N Rows displays the balanced dataset for verification.

This workflow helps prevent model bias toward majority classes and ensures improved predictive accuracy on imbalanced datasets.

bottom of page