Run your Sparkflows Workflows on Databricks

 

 

 Sparkflows is integrated with Databricks. Below are the key integration points:

  • Sparkflows can be configured to talk with the Databricks endpoint.

  • Datasets can be created in Sparkflows pointing to your tables in Databricks.

  • Workflows can be created in Sparkflows with these datasets.

  • Workflows from Sparkflows are run on the Databricks cluster.

  • As the workflow is running, the summary results from the nodes are streamed back to Sparkflows and displayed.

Note: In order to create certain results from Sparkflows, and be able to use it in your Databricks Notebooks, create a temp table from the result with RegisterTempTable Node in Sparkflows. 

Enable Databricks in Sparkflows

 

In Administration/Configuration tab within the Fire UI,

  • Set databricks.enabled to true. 

  • Set the app.postMessageURL to point to the public IP/hostname of the machine on which Sparkflows is installed. Results from the Sparkflows jobs running in Databricks is streamed back to this URL.

Configure the Databricks Endpoint in Sparkflows

 

In Databricks/Configuration, configure the Databricks Endpoint. The Databricks username, password and endpoint are needed to set up the endpoint. For security reasons, nothing gets saved, and just stays in memory only. So when logging back in, the endpoint configuration has to be done again.

Create a library on Databricks with fire jar

 

 

The workflows from Sparkflows are powered by the code in the fire jar file contained in the directory sparkflows-x.y.z/fire-lib

 

Upload the fire jar file with dependencies to Databricks as a New Library : fire-core-x.y.z-jar-with-dependencies.jar. Use the Library Link in Databricks to create a new library and upload the fire core jar file. The jar file is ~155 MB.

 

For x.y.z use the right version numbering.

Create a table on Databricks or use an existing table you have created on Databricks

Create a dataset in Sparkflows with table in Databricks

In Sparkflows, create a new Dataset, pointing to your table on Databricks. Below we see 4 tables which are in Databricks. Select a table for your Dataset.

Dataset details

Click the Update button to fetch the schema of the table and save it.

Use the Dataset created in a Workflow

 

The workflow below creates a RandomForest for predicting the Housing price

Execute the Workflow in Sparkflows

The workflow gets submitted to your Databricks Cluster and is executed there

Results are streamed back to be displayed in Sparkflows

Random Forest generated from the Workflow Execution is displayed in Sparkflows

Use RegisterTempTable to view the results from Sparkflows in Databricks Notebooks

The workflow below shows the RegisterTempTable node used in the workflow. When the workflow executes, the temp table is accessible in the Databricks Notebook.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SUPPORT

For support please email:

SOCIAL

  • facebook
  • linkedin
  • twitter
  • angellist
© 2019 Sparkflows, Inc. All rights reserved. 

Terms and Conditions