top of page

Workflow Automation Templates

A library of ready-to-use workflow templates to accelerate your data journey

Random Forest Sales Prediction

Predict weekly sales with RF

Data-cleaning.jpg
Overview

This workflow builds a Random Forest Regression model using Spark ML to forecast weekly sales. It combines feature engineering, data splitting, model training, and evaluation to deliver accurate sales predictions.

Details

The process starts with importing sales data using the Read CSV node. The Vector Assembler and String Indexer nodes prepare the dataset by converting categorical and numeric variables into model-ready features. The dataset is then divided into training and testing subsets using the Split node.

The Random Forest Regression node trains a predictive model to estimate weekly sales based on historical and feature data. Predictions are generated using the Predict node and evaluated through the Regression Evaluator node, which computes metrics such as RMSE to measure model accuracy.

Finally, the Print N Rows node displays sample prediction results for quick validation. This workflow provides a scalable, high-performance solution for time-based sales forecasting using Spark ML.

bottom of page