top of page

Workflow Automation Templates

A library of ready-to-use workflow templates to accelerate your data journey

Feature Extraction for Text Data

Convert text into numerical features

Data-cleaning.jpg
Overview

This workflow extracts numerical features from text data using TF-IDF (Term Frequency–Inverse Document Frequency). It transforms raw text into meaningful numerical vectors for machine learning models, enabling better text-based predictions and classifications.

Details

The workflow begins by loading the Spam dataset containing message text. The Tokenizer node splits the text into individual words or tokens, which are then processed by the Hashing TF node to calculate term frequencies. The IDF node adjusts these frequencies based on the rarity of terms across documents, producing TF-IDF scores.

Finally, the Print N Rows node displays the transformed data for inspection. This workflow enables efficient and scalable text vectorization for spam detection, sentiment analysis, and natural language processing tasks.

bottom of page