Data Scientists are used to using various languages and tools for machine learning - R, Python, Scala, Spark, H2O, SAS etc.
Feature Generation is one aspect which takes a significant amount of time. Sparkflows provides a powerful set of capabilities in this regard.
Sparkflows has approximately 60 Processors for Machine Learning including Modeling algorithms for Regression, Classification, Clustering etc. plus complex feature generation.
Feature Generation
Feature Generation capabilities in Sparkflows include:
SQL
NLP
OCR
Text
TF-IDF
Tokenizer
StopWordsRemover
Convert case
Bucketizer
Imputer
Number of ETL operations
JOIN
RowFilter
ColumnFilter
Union
Drop Rows with Null
Number of Date operations
Date to Age
Extract Month, Hour, Minute, Second
Date Difference
Number of Math Functions
Statistics
Summary of various columns
Correlation between columns
This makes it possible to use Sparkflows for feature generation for Big Data and then use various other tools for their modeling.
Comments