May 10, 20191 min

Reading and Writing from MongoDB

MongoDB is a document database with the scalability and flexibility that you want with the querying and indexing that you need, Here we are loading data from HDFS and Saving it into MongoDB.

Workflow for Loading data into MongoDB

The below workflow reads in the Sample Dataset which is in CSV format from HDFS.

It then saves the data into MongoDB.

Workflow saves the data into MongoDB.

The below diagram shows the dialog box for the SaveMongoDB Processor.

Dialog box for the SaveMongoDB Processor

Workflow Execution

When we execute the Workflow, it reads in the dataset from HDFS and loads it into MongoDB.

Workflow for Reading data from MongoDB

The below workflow reads Data in MongoDB, It then prints the data.

Workflow reads Data in MongoDB, It then prints the data

The below diagram shows the dialog box for the ReadMongoDB Processor.

ReadMongoDB Processor

In the above dialog, the ‘Refresh Schema’ button infers the schema of the collections. Thus it is able to pass down the output schema to the next Processor making it easy for us to build the workflow.

Workflow Execution

When we execute the Workflow, it reads in the Sample collection from MongoDB and displays the first few lines.

We see that the Sample data records we wrote to MongoDB in the first workflow is read back now.

The Sample data records is read back now

1060