Workflow Automation Templates
A library of ready-to-use workflow templates to accelerate your data journey
Feature Extraction for Text Data
Convert text into numerical features

Overview
This workflow extracts numerical features from text data using TF-IDF (Term Frequency–Inverse Document Frequency). It transforms raw text into meaningful numerical vectors for machine learning models, enabling better text-based predictions and classifications.
Details
The workflow begins by loading the Spam dataset containing message text. The Tokenizer node splits the text into individual words or tokens, which are then processed by the Hashing TF node to calculate term frequencies. The IDF node adjusts these frequencies based on the rarity of terms across documents, producing TF-IDF scores.
Finally, the Print N Rows node displays the transformed data for inspection. This workflow enables efficient and scalable text vectorization for spam detection, sentiment analysis, and natural language processing tasks.