Saving Data To HIVE
Workflows allow us to create data pipelines. In data pipelines, data is transformed, models can be generated etc. We might want to save the resulting data at some point to a HIVE table. It would also thus allow the data to be accessed through BI tools like Tableau.
Sparkflows has one node to saving data to HIVE:
SaveAsTable : Saves the data into a HIVE table
Cluster vs Standalone Mode
Sparkflows can be running in Cluster mode or in the Standalone mode. These settings are in Administration/Configuration. The specific parameter is app.runOnCluster.
When connecting to HIVE, Sparkflows should be running in Cluster mode on an edge node of a Hadoop Cluster. The HIVE settings have to be correctly set under Administration/Configuration.
Workflow containing SaveAsTable
Below is a workflow containing SaveAsTable. It reads in the Housing Dataset and saves it into the HIVE table 'housing_table'.
Results of execution of Workflow
When the workflow is executed, the data is written into the HIVE table 'housing_table'.
The 'housing_table' gets created with the schema of the Housing Dataset.