There are use cases where we want to focus on Data Quality. One of the use cases is that if the number of records being processed in less than a certain number, then stop execution.
Fire Insights now supports 2 new Processors:
NodeCount counts the number of records in the Dataset and stores it in a variable in the JobContext.
NodeAssert allows the user to provide a conditional expression to be evaluate. NodeAssert has 2 outputs. Based on the results of execution of the expression, execution is sent to one of the outputs.
The conditional expression can use variables generated prior to it.
Below is a workflow which uses the 2 new Processors : NodeCount and NodeAssert
The workflow does the following:
Reads in the NYC Trip Data
Finds the number of incoming records
Evaluates if the number of incoming records is greater than 100.
If it is greater than 100, then it saves the dataset in Parquet format.
Else, it prints the records.
Below is the configuration set of the NodeCount Processor.
It finds the count of the incoming records and stores it in the variable cnt into the JobContext.
Below is the configuration set of the NodeAssert Processor.
It evaluates if the value of cnt is greater than 100.