top of page

Workflow Automation Templates

A library of ready-to-use workflow templates to accelerate your data journey

H2O Classification

Predict diabetes

Data-cleaning.jpg
Overview

This workflow builds a classification model using the H2O Distributed Random Forest (DRF) algorithm to predict whether a patient is diabetic based on medical data. It includes data splitting, model training, scoring, and model saving.

Details

The workflow starts by loading the diabetes dataset and splitting it into training and testing subsets using the Split node. The H2O Distributed Random Forest node trains the model on the training data to classify patients as diabetic or non-diabetic.

Predictions are generated using the H2O Score node, which applies the model to the test data and computes performance metrics. The Print N Rows node previews prediction results, while the H2O ML Model Save node stores the trained model for future use.

This workflow provides an efficient and interpretable machine learning approach for medical classification using H2O and PySpark.

bottom of page