top of page

Installation

deploy.png

Overview

Sparkflows can be installed on cloud or on-premise. It can be installed on AWS, Azure, Google Cloud, Databricks, Cloudera, and Hortonworks.

AWS_SPARKFLOWS_ARCHITECTURE.png

AWS

Sparkflows can be installed on AWS. It can be deployed on a standalone EC2 machine. It can then read data from S3, Redshift, etc. process them, and write out the results to S3, Redshift, etc.

Or it can be installed on the edge node of an EMR cluster. In this case, it would submit the jobs to the EMR cluster for processing.

GCP_architecture.png

GCP

Sparkflows can be installed on GCP. It can be deployed on a standalone EC2 machine. It can then read data from S3, Redshift, etc. process them, and write out the results to S3, Redshift, etc.

Or it can be installed on the edge node of an EMR cluster. In this case, it would submit the jobs to the EMR cluster for processing.

sparkflows_azure_hdinsights.png

Azure

Sparkflows can be installed on Azure. It can be deployed on a standalone machine. It can then read data from ADLS, SQL Server etc. process them and write out the results to ADLS, SQL Server etc.

Or it can be installed on the edge node of an HDInsight cluster. In this case it would submit the jobs to the HDInsight cluster for processing.

sparkflows_cloudera.png

Cloudera

Sparkflows can be installed on the edge node of a Cloudera Cluster. It then submits the jobs to the Cluster. Sparkflows interact with HIVE, HDFS, Kafka, etc.

sparkflows_azure_databricks.png

Databricks

Sparkflows can be installed on one or more machines. The jobs get submitted to the Databricks cluster.

SPARKFLOWS-STANDALONE-ARCHITECTURE.png

Laptop

Sparkflows can be installed on a standalone machine.

bottom of page