Workflow Automation Templates
A library of ready-to-use workflow templates to accelerate your data journey
Anomaly Detection Using Isolation Forest Model
Detect anomalies using Isolation Forest

Overview
This workflow identifies unusual or anomalous data points using the H2O Isolation Forest algorithm. It combines dimensionality reduction, data partitioning, and model scoring to efficiently detect outliers within complex datasets.
Details
The workflow begins by importing data through the Read CSV node, followed by dimensionality reduction using the H2O PCA node to minimize feature redundancy and enhance model performance. The reduced dataset is then divided into training and testing sets using Split with Stratified Sampling for balanced representation.
The H2O Isolation Forest node trains the anomaly detection model on the training data, isolating abnormal observations based on recursive random partitioning. The trained model is then applied to the test data using the H2O Score node to generate anomaly scores that indicate the degree of deviation from normal patterns.
The results are previewed with Print N Rows, and the trained model is saved via H2O ML Model Save for reuse or deployment. This workflow is ideal for detecting fraud, network intrusions, sensor faults, or other rare events in large-scale data.