top of page

Search

Data Exploration of Housing Data

Sparkflows
Aug 18, 2021
1 min read

Updated: Oct 6, 2021

ree

This workflow shows how to explore the Housing Dataset from Kaggle with Sparkflows.

Workflow

The below workflow:

Reads the Housing dataset
Calculates summary statistics for important variables
Creates a histogram to show the distribution of the Sale Price variable
Creates a graph to show the relationship between Sale Price and Basement Square Footage
Creates a matrix to show the correlation between important variables
Flags outliers in Ground Living Area and graphs the results

ree

Reading Housing Dataset

DatasetStructured Processor creates a Dataframe of your dataset named Housing Training by reading data from HDFS, HIVE etc. which have been defined earlier in Fire by using the Dataset feature.

Processor Output

ree

Calculate Summary Statistics

Summary Statistics Processor calculates summary statistics for the selected variables.

Processor Configuration

ree

Processor Output

ree

Create Histogram Graph

HistoGram Processor creates a histogram to show distribution by count of Sale Price.

Processor Configuration

ree

Processor Output

ree

Graph Values

Graph Values Processor graphs the relationship between Sale Price and Basement Square Footage.

Processor Configuration

ree

Processor Output

ree

Plot Correlation Matrix

Correlation Processor creates a correlation matrix of selected variables and plots the results.

Processor Configuration

ree

Processor Output

ree

Flag Outliers and Create Graph

Flag Outlier Processor creates a new flag column to mark outliers and Graph Group by Column Processor graphs the count in each category.

Processor Configuration

ree

Processor Output

ree

Recent Posts

Sparkflows - A Compelling Alternative to Alteryx

Sparkflows - A Compelling Alternative to Alteryx

Self-Serve Advanced Analytics with Sparkflows and AWS

Self-Serve Advanced Analytics with Sparkflows and AWS

How Sparkflows with Data Science use cases is enabling CPG companies

How Sparkflows with Data Science use cases is enabling CPG companies

Comments

bottom of page