This tutorial covers how to build a Random Forest Model to predict customer churn in the telecommunications market.
The workflow:
Reads in the dataset from a tab separated file.
Applies StringIndexer on the field “intl_plan”.
Applies VectorAssembler on the fields we want to model on.
Splits the dataset into (.8, .2).
Performs Random Forest Classification.
Performs prediction using the model generated on the remaining 20% dataset.
Finally evaluates the prediction result.
The full tutorial is available here: https://docs.sparkflows.io/en/latest/tutorials/data-science/sparkml/telco-churn-prediction.html