five

Privacy-Safe Synthetic Data in Unity Catalog for Financial Services

收藏
Databricks2025-02-14 收录
下载链接:
https://marketplace.databricks.com/details/32db0d45-89ad-4159-b15c-f031d981998a/MOSTLY-AI_Privacy-Safe-Synthetic-Data-in-Unity-Catalog-for-Financial-Services
下载链接
链接失效反馈
官方服务:
资源简介:
## Overview This Databricks Solution Accelerator enables financial institutions to generate high-quality, privacy-preserving synthetic data using the **open-source Synthetic Data SDK from MOSTLY AI**—all within **Databricks**. By leveraging **Unity Catalog** for secure data management and governance, this accelerator helps financial services teams unlock access to **AI/ML training, analytics, fraud detection, and risk modeling—without exposing sensitive customer information**. --- ## 📌 How to Get Started To use this accelerator, **click "Get Instant Access"** in the top right. - The **included notebooks** will appear in the **"Other assets"** section of the **Delta Share**, where you can **clone them into your own workspace**. - The **financial data tables** will also be accessible in **Unity Catalog** via the Delta Share, allowing you to begin generating synthetic data immediately. --- ## 🔹 Use Cases Financial institutions can leverage this accelerator for multiple synthetic data applications, including: ✔ **AI/ML Model Training** – Use privacy-preserving synthetic data to train **fraud detection, credit risk, and predictive analytics models**. ✔ **Self-Service Data Access** – Enable analysts and data scientists to work with **high-quality data while keeping real customer information protected**. ✔ **Regulatory Compliance & Data Sharing** – Securely share **financial data across teams, partners, and regulatory bodies** while maintaining full compliance. ✔ **Software Testing & Application Development** – Use synthetic data for **developing and testing banking applications** without production data risks. --- ## 🔹 Product Details This accelerator provides a **streamlined, end-to-end synthetic data pipeline** for **financial services**. It consists of **two fully parameterized, widget-driven notebooks** that allow users to: 📌 **Step 1**: **Train a synthetic data generator** on their financial data 📌 **Step 2**: **Generate synthetic datasets** that retain key relationships 📌 **All results are securely stored in Unity Catalog** --- ## 📂 Included Financial Data: Accounts & Transactions This accelerator includes **preloaded financial data in Delta Share**, ensuring users can **immediately** train and generate synthetic data **without needing external datasets**. The included tables are derived from the **Berka dataset**, a well-known **public banking dataset** that has been **preprocessed for seamless integration into Unity Catalog**: 🔹 **Accounts** – Represents **customer bank accounts**, including **creation dates** and **assigned districts**. 🔹 **Transactions** – Logs **deposits, withdrawals, and transfers** across accounts. This structure mirrors **real-world banking data**, making it highly relevant for financial institutions needing synthetic data for: ✔ **AI/ML model training** ✔ **Data democratization** ✔ **Regulatory compliance** 💡 **Performance Tip:** This dataset includes **over 1 million transactions**. To ensure **quick execution while maintaining model performance**, we recommend setting the **Max Training Time** to **10 minutes**. --- ## 🛠️ Included Notebooks ### **📌 Step 1: Train a Synthetic Data Generator** ✔ Load financial data from **Unity Catalog** ✔ Train a synthetic data generator using the **open-source Synthetic Data SDK** ✔ Export the generator **to Unity Catalog Volumes** ### **📌 Step 2: Generate Synthetic Data** ✔ Use the trained generator to **produce synthetic financial datasets** ✔ Preserve **key relationships** and store the results in **Unity Catalog** --- ## 💡 Key Benefits for Financial Services ✅ **Privacy-Preserving Data** – Enables **AI/ML model training** without exposing real customer data, ensuring compliance with **GDPR, CCPA, and other regulations**. ✅ **No-Code, User-Friendly Setup** – Fully **widget-driven**, allowing easy setup and execution **without coding experience**. ✅ **Unity Catalog Integration** – Ensures **governance, auditability, and discoverability** across teams in Databricks. ✅ **Self-Service Data Access** – Democratizes **data access** across finance teams without manual anonymization. ✅ **AI/ML Model Training & Risk Modeling** – Ideal for **fraud detection, credit risk modeling, and predictive analytics**. ✅ **Reusable Generators** – Once trained, a **synthetic data generator can be reused indefinitely**, reducing the need for re-training. ✅ **Multi-Table Support** – Automatically preserves **primary and foreign key relationships** across datasets. --- ## 🔥 Scaling Beyond Open Source While the **open-source Synthetic Data SDK** provides powerful synthetic data generation, organizations looking to scale can unlock: ✔ **Enterprise-grade observability & governance** – Gain **centralized visibility into synthetic data generation, usage, and performance**. ✔ **Robust professional services & support** – Leverage **expert guidance** for **implementation, compliance, and best practices**. 📩 **Interested in scaling synthetic data across your organization?** Contact **[hello@mostly.ai](mailto:hello@mostly.ai)** to discuss **enterprise solutions**.

## 概述 此Databricks解决方案加速器可帮助金融机构在Databricks平台内,借助**MOSTLY AI开源合成数据软件开发工具包(Synthetic Data SDK)**生成高质量、隐私保护型合成数据。 通过利用**统一数据目录(Unity Catalog)**实现安全的数据管理与治理,该加速器可助力金融服务团队解锁AI/ML训练、数据分析、欺诈检测与风险建模等场景的使用权限,同时无需暴露敏感客户信息。 --- ## 📌 快速入门 若要使用此加速器,请点击页面右上角的「立即获取访问权限」按钮。 - 所包含的笔记本将显示在**Delta共享(Delta Share)**的「其他资产」板块中,您可将其克隆至您自己的工作区。 - 金融数据表也可通过Delta共享(Delta Share)在统一数据目录(Unity Catalog)中访问,让您能够立即开始生成合成数据。 --- ## 🔹 应用场景 金融机构可借助此加速器开展多种合成数据应用,具体包括: ✔ **AI/ML模型训练**:使用隐私保护型合成数据训练欺诈检测、信用风险与预测分析模型。 ✔ **自助式数据访问**:让分析师与数据科学家能够使用高质量数据,同时保护真实客户信息不被泄露。 ✔ **监管合规与数据共享**:在完全符合监管要求的前提下,安全地在团队、合作伙伴与监管机构之间共享金融数据。 ✔ **软件测试与应用开发**:使用合成数据开发和测试银行应用程序,规避生产数据使用风险。 --- ## 🔹 产品详情 此加速器为金融服务行业提供了一套精简的端到端合成数据流水线,包含两个完全参数化、支持小部件(widget)驱动的笔记本,可让用户完成以下操作: 📌 **步骤1**:基于自身金融数据训练合成数据生成器 📌 **步骤2**:生成保留关键关联关系的合成数据集 所有结果均安全存储于统一数据目录(Unity Catalog)中。 --- ## 📂 包含的金融数据:账户与交易记录 此加速器在Delta共享(Delta Share)中预置了金融数据,确保用户无需外部数据集即可立即开展合成数据的训练与生成工作。 所包含的数据表源自知名的公开银行数据集**Berka数据集**,该数据集已经过预处理,可无缝集成至统一数据目录(Unity Catalog): 🔹 **账户表**:记录客户银行账户信息,包括账户创建日期与所属行政区。 🔹 **交易记录表**:记录各账户间的存款、取款与转账信息。 该数据结构贴合真实银行数据场景,可广泛满足金融机构开展以下合成数据相关需求: ✔ AI/ML模型训练 ✔ 数据民主化 ✔ 监管合规 💡 **性能提示**: 此数据集包含超过100万条交易记录。 为确保在保证模型性能的同时实现快速执行,建议将**最大训练时长**设置为10分钟。 --- ## 🛠️ 包含的笔记本 ### 📌 步骤1:训练合成数据生成器 ✔ 从统一数据目录(Unity Catalog)加载金融数据 ✔ 使用MOSTLY AI开源合成数据软件开发工具包(Synthetic Data SDK)训练合成数据生成器 ✔ 将生成器导出至统一数据目录卷(Unity Catalog Volumes) ### 📌 步骤2:生成合成数据 ✔ 使用已训练完成的生成器生成合成金融数据集 ✔ 保留关键关联关系,并将结果存储至统一数据目录(Unity Catalog) --- ## 💡 金融服务行业核心优势 ✅ **隐私保护型数据**:可在不暴露真实客户数据的前提下开展AI/ML模型训练,确保符合GDPR、CCPA及其他监管法规要求。 ✅ **无需编码、易于上手**:完全采用小部件驱动,无需具备编码经验即可轻松完成设置与执行。 ✅ **统一数据目录(Unity Catalog)集成**:确保Databricks平台内各团队的数据治理、可审计性与可发现性。 ✅ **自助式数据访问**:无需手动执行匿名化操作,即可实现金融团队间的数据民主化共享。 ✅ **AI/ML模型训练与风险建模**:适用于欺诈检测、信用风险建模与预测分析等场景。 ✅ **可重复使用的生成器**:训练完成后,合成数据生成器可无限次重复使用,减少重复训练的需求。 ✅ **多表支持**:自动保留数据集间的主键与外键关联关系。 --- ## 🔥 突破开源局限的扩展能力 尽管开源合成数据软件开发工具包(Synthetic Data SDK)已具备强大的合成数据生成能力,但寻求规模化应用的企业还可获得以下额外功能: ✔ **企业级可观测性与治理能力**:获取合成数据生成、使用与性能的集中式可视化视图。 ✔ **完善的专业服务与支持**:获得专家指导,助力实施落地、合规性保障与最佳实践落地。 📩 **希望在您的组织内规模化推广合成数据应用?** 请联系**[hello@mostly.ai](mailto:hello@mostly.ai)** 洽谈企业级解决方案。
提供机构:
MOSTLY AI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作