Within the Big Data Ecosystem, Sparkflows excels at the following core capabilities:
- Building Data Pipelines
- Data Preparation
- Complex Feature Generation & Machine Learning
- Streaming Analytics
The diagram above depicts, where Sparkflows is positioned in the Big Data Ecosystem. Sparkflows, allows building applications using building blocks, hence it is complementary to all the other products in the ecosystem.
For example, if we had to take a dataset, build a Random Forest Classification model, score and evaluate the model, the workflow can be build in Sparkflows in minutes and be run on the cluster. The workflow can as easily be run from the command line.
Sparkflows also fits great at solving a lot of the day to day needs of a data engineer or data scientist. For example, if we had to take a dataset, do certain transforms and load the results into HBase, Solr, Elasticsearch, HDFS etc. it would take a substantial amount of time to code it, compile it and then deploy it. With Sparkflows, it is a matter of minutes to build the workflow, map the columns appropriately and start running it.
In all of the software development lifecycle, we know that development is less than 20% of the overall effort. The remaining is maintaining and extending it. Sparkflows really shines in this aspect, as it makes it much easier to visualize the workflow at any time, and make relevant changes to it. With Sparkflows, it is easy to handover the processing built by on analyst or developer to another.