Workflow Automation Templates
A library of ready-to-use workflow templates to accelerate your data journey
Data Cleaning and Feature Generation
Prepare and enrich data for modeling

Overview
This workflow combines data cleaning and feature generation to improve data quality and enhance model performance. It removes inconsistencies, formats columns, and creates new features that provide deeper analytical value.
Details
The workflow begins by loading the dataset. It cleans the data by removing nulls and duplicates, converting text to lowercase, and formatting date fields using the DateTime Field Extract node. New features such as year and month are derived from date columns to capture temporal patterns.
Next, data transformation and feature engineering steps are applied using Vector Assembler, Standard Scaler, String Indexer, and One Hot Encoder to prepare the dataset for modeling. The Summary and Correlation nodes help assess variable relationships, while Print N Rows displays intermediate outputs for verification.
This workflow ensures clean, structured, and feature-rich data ready for analysis or machine learning.