top of page

Workflow Automation Templates

A library of ready-to-use workflow templates to accelerate your data journey

Data Cleaning and Feature Generation

Prepare and enrich data for modeling

Data-cleaning.jpg
Overview

This workflow combines data cleaning and feature generation to improve data quality and enhance model performance. It removes inconsistencies, formats columns, and creates new features that provide deeper analytical value.

Details

The workflow begins by loading the dataset. It cleans the data by removing nulls and duplicates, converting text to lowercase, and formatting date fields using the DateTime Field Extract node. New features such as year and month are derived from date columns to capture temporal patterns.

Next, data transformation and feature engineering steps are applied using Vector Assembler, Standard Scaler, String Indexer, and One Hot Encoder to prepare the dataset for modeling. The Summary and Correlation nodes help assess variable relationships, while Print N Rows displays intermediate outputs for verification.

This workflow ensures clean, structured, and feature-rich data ready for analysis or machine learning.

bottom of page