Data Quality Accelerator
收藏Databricks2024-07-06 收录
下载链接:
https://marketplace.databricks.com/details/b29e4ef9-e614-4336-89d2-b31a12c8748c/Pingahla-Inc_Data-Quality-Accelerator
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
This data quality project implements a series of data quality rules to ensure the integrity and reliability of the data. The project leverages various types of data quality checks including accuracy, anomaly detection, completeness, uniqueness, and validity. These rules are applied to different levels such as columns and tables to provide a comprehensive assessment of the data.
To run Pingahla's Data Quality Accelerator Demo, simply execute the provided notebook. You will be prompted to select an Output Connection from the available options. Next, enter the credentials for all the desired connections for your data (Databricks, AWS, Azure). Upon completion, your data will be stored in the Output Connection database, where it can be viewed on a dashboard.
**Use cases**
- Data Integrity Verification: Ensure that the data adheres to predefined rules for accuracy, completeness, and consistency.
- Anomaly Detection: Identify and flag potential anomalies in the data using methods like Isolation Forest.
- Data Standardization: Apply rules to maintain consistency in data formats, such as email and phone numbers.
- Schema Validation: Check for duplicate natural keys to maintain data integrity.
- Cleansing: Based on the results of data quality, you’ll be able to make the cleansing to those that did not comply with it.
**Additional Insights**
This data quality project is designed to be integrated with multiple platforms including Databricks, AWS, and Azure. Users can configure the project through a JSON configuration file, enabling flexibility and customization according to their specific needs.
提供机构:
Pingahla Inc



