Offerings
Sparkflows offers diverse solutions for AI, Generative AI, and data engineering.
With ready-to-use vertical use cases, businesses can implement these technologies quickly and effectively.
Data Quality Assessment and Remediation
Sparkflows provides extensive Data Quality capabilities. Users can define rules for data quality or can also build workflows for Data Quality. The results and metrics of Data Quality checks are posted to a Dashboard. Sparkflows provides powerful ways to remove duplicates including fuzzy dedup. It enables finding, tagging and removing the outliers. It enables handling null values with various Imputation methods. The records that fail data quality can be filtered and saved.
Data Quality Rules

Provides Data Quality Rules for the Datasets. Automatically suggests Data Quality Rules

Trends and Dashboards
Data Quality Trends & Dashboards

Scheduling
Schedules the Data Quality Jobs

Collaborate
Schedules the Data Quality Jobs

Data Remediation
Provides auto-remedial actions to be taken
Alerts & Notifications

Generates Alerts & Notifications when the data quality score falls below threshold
Current Scenario
Sparkflows offers comprehensive Data Quality features. Users have the ability to set rules for data quality and create workflows to their needs. The outcomes from Data Quality assessments are displayed on a Dashboard. Sparkflows includes robust options for eliminating duplicates, including fuzzy deduplication, and for the identification, tagging, and removal of outliers. It also provides various imputation methods for managing null values, and records that do not meet data quality standards can be filtered and saved.

Specify Weightage for each Rule
-
Ability to specify weightage for each Rule
-
Ability to specify threshold for Data Quality Index

AI Generated Rules
-
Rules are suggested by running data profiling and ML models
-
Users can choose rules to accept

Deduplicate Records
-
Remove duplicate records with fuzzy matching
Integration with Great Expectations

-
Integration with Great Expectations
-
OpenAI has indexed GE Rules
Data Quality
Validation Rules
Data Masking
Imputation
Outlier detection
Anomaly patterns
Relationship Discovery
Deduplication
Scheduling
Data Quality Dashboard



Data Quality Details for Datasets


Data Quality Rules
-
id to be primary key
-
price to have values between 100-1500
-
pattern matching, dt pattern \d{4}-\d{2}-\d{2}
-
completeness, never be null
-
is nonnegative, not contain negative value
-
values in, contains only listed values
-
max and min, check for max and min
-
size, check for data size

AI Generated Rules
Run a set of profiling and ML algorithms and come up with recommendations for rules for the Dataset
-
Drop Null
-
Impute with different types
-
Remove Duplicates
-
Remove Outliers

Data Quality and Profiling are powered by workflows in the Backend
Run a set of profiling and ML algorithms and come up with recommendations for rules for the Dataset

Data Quality is also integrated
with Great Expectations
Run a set of profiling and ML algorithms and come up with recommendations for rules for the Dataset

Watch Video
Discover the advanced data quality capabilities of Sparkflows