Sparkflows has a couple of nodes for splitting the incoming DataFrame. One is to split it into two based on the percentage specified for the split. This is useful for Machine Learning workflows.
The other is to split the incoming DataFrame based on an expression. Rows satisfying the expression go into one DataFrame and the rest go into another DataFrame.
We recently ran into a requirement for splitting the incoming DataFrame by multiple conditions. So we added a new Node called 'SplitByMultipleExpressions'. It allows users to specify up to 5 conditional expressions. The output of each expression is routed to one output path.
Below is a workflow which exercises the new Node.
In the above workflow, the Node 'Split By Multiple Expressions' is able to split the incoming DataFrame into 3 output DataFrames. With the current implementation it can split up to 5 way.
Each of the 3 output DataFrames look like below: