Saving Data To HIVE

Workflows allow us to create data pipelines. In data pipelines, data is transformed, models can be generated etc. We might want to save the resulting data at some point to a HIVE table. It would also thus allow the data to be accessed through BI tools like Tableau.

 

Sparkflows has one node to saving data to HIVE:

 

  • SaveAsTable : Saves the data into a HIVE table

 

Cluster vs Standalone Mode

  • Sparkflows can be running in Cluster mode or in the Standalone mode. These settings are in Administration/Configuration. The specific parameter is app.runOnCluster.

  • When connecting to HIVE, Sparkflows should be running in Cluster mode on an edge node of a Hadoop Cluster. The HIVE settings have to be correctly set under Administration/Configuration.

 

Workflow containing SaveAsTable

 

Below is a workflow containing SaveAsTable. It reads in the Housing Dataset and saves it into the HIVE table 'housing_table'.

Results of execution of Workflow

When the workflow is executed, the data is written into the HIVE table 'housing_table'.

The 'housing_table' gets created with the schema of the Housing Dataset.

RESOURCES

SOCIAL

  • facebook
  • linkedin
  • twitter
  • angellist
© 2020 Sparkflows, Inc. All rights reserved. 

Terms and Conditions