five

Data Quality Accelerator

收藏
Databricks2024-07-06 收录
下载链接:
https://marketplace.databricks.com/details/b29e4ef9-e614-4336-89d2-b31a12c8748c/Pingahla-Inc_Data-Quality-Accelerator
下载链接
链接失效反馈
官方服务:
资源简介:
**Overview** This data quality project implements a series of data quality rules to ensure the integrity and reliability of the data. The project leverages various types of data quality checks including accuracy, anomaly detection, completeness, uniqueness, and validity. These rules are applied to different levels such as columns and tables to provide a comprehensive assessment of the data. To run Pingahla's Data Quality Accelerator Demo, simply execute the provided notebook. You will be prompted to select an Output Connection from the available options. Next, enter the credentials for all the desired connections for your data (Databricks, AWS, Azure). Upon completion, your data will be stored in the Output Connection database, where it can be viewed on a dashboard. **Use cases** - Data Integrity Verification: Ensure that the data adheres to predefined rules for accuracy, completeness, and consistency. - Anomaly Detection: Identify and flag potential anomalies in the data using methods like Isolation Forest. - Data Standardization: Apply rules to maintain consistency in data formats, such as email and phone numbers. - Schema Validation: Check for duplicate natural keys to maintain data integrity. - Cleansing: Based on the results of data quality, you’ll be able to make the cleansing to those that did not comply with it. **Additional Insights** This data quality project is designed to be integrated with multiple platforms including Databricks, AWS, and Azure. Users can configure the project through a JSON configuration file, enabling flexibility and customization according to their specific needs.
提供机构:
Pingahla Inc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作