top of page

Feature Generation with Sparkflows for your ML experiments


Data Scientists are used to using various languages and tools for machine learning - R, Python, Scala, Spark, H2O, SAS etc.

Feature Generation is one aspect which takes a significant amount of time. Sparkflows provides a powerful set of capabilities in this regard.

Sparkflows has approximately 60 Processors for Machine Learning including Modeling algorithms for Regression, Classification, Clustering etc. plus complex feature generation.

Feature Generation

Feature Generation capabilities in Sparkflows include:

  • SQL

  • NLP

  • OCR

  • Text

  • TF-IDF

  • Tokenizer

  • StopWordsRemover

  • Convert case

  • Bucketizer

  • Imputer

  • Number of ETL operations

  • JOIN

  • RowFilter

  • ColumnFilter

  • Union

  • Drop Rows with Null

  • Number of Date operations

  • ​Date to Age

  • Extract Month, Hour, Minute, Second

  • Date Difference

  • Number of Math Functions

  • Statistics

  • ​Summary of various columns

  • Correlation between columns

This makes it possible to use Sparkflows for feature generation for Big Data and then use various other tools for their modeling.


104 views0 comments
bottom of page