MongoDB is a document database with the scalability and flexibility that you want with the querying and indexing that you need, Here we are loading data from HDFS and Saving it into MongoDB.
Workflow for Loading data into MongoDB
The below workflow reads in the Sample Dataset which is in CSV format from HDFS.
It then saves the data into MongoDB.
The below diagram shows the dialog box for the SaveMongoDB Processor.
When we execute the Workflow, it reads in the dataset from HDFS and loads it into MongoDB.
Workflow for Reading data from MongoDB
The below workflow reads Data in MongoDB, It then prints the data.
The below diagram shows the dialog box for the ReadMongoDB Processor.
In the above dialog, the ‘Refresh Schema’ button infers the schema of the collections. Thus it is able to pass down the output schema to the next Processor making it easy for us to build the workflow.
When we execute the Workflow, it reads in the Sample collection from MongoDB and displays the first few lines.
We see that the Sample data records we wrote to MongoDB in the first workflow is read back now.