five

Synthetic Data Generator by MOSTLY AI for Databricks

收藏
Databricks2024-08-14 收录
下载链接:
https://marketplace.databricks.com/details/0fc15edc-d3c9-488d-980e-b438fcc3c162/MOSTLY-AI_Synthetic-Data-Generator-by-MOSTLY-AI-for-Databricks
下载链接
链接失效反馈
官方服务:
资源简介:
***Overview*** This solution accelerator integrates MOSTLY AI's synthetic data generation capabilities within Databricks. By providing high-quality, privacy-preserving synthetic data, this accelerator enables safer and faster access to data for everyone, leveraging the power of Generative AI (GenAI). ***Description*** This solution accelerator comprises a series of notebooks designed to streamline the process of synthetic data generation using MOSTLY AI within Databricks. The notebooks guide users through installing the necessary packages, setting up configurations, training synthetic data generators, and ultimately generating synthetic data. The process is abstracted to ensure users can input variables through widgets and run the notebooks without needing to modify the underlying code. The notebooks included are: **Step 0: Training a MOSTLY AI Generator** - Install the MOSTLY AI package, initialize the client, and train a new synthetic data generator. **Step 1: Save Generator Path, API Key Path & URL to a Unity Catalog Volume** - Save critical configuration information to Unity Catalog for later use. **Step 2: Create, Load & Register the Generator as a Model to Unity Catalog** - Create a model object, register it in Unity Catalog, and save it for future use. **Step 3: Generate Synthetic Data from UC Model** - Load the registered model, generate synthetic data, and write it to Unity Catalog. ***Benefits*** This solution accelerator provides several benefits, including: **Simplified Setup:** Users can quickly set up and utilize MOSTLY AI's synthetic data generators within Databricks by following a series of guided notebooks. **User-Friendly**: The use of widgets for variable input ensures minimal need for direct code modifications, making the process accessible even for those with limited technical expertise. **Efficiency**: By abstracting the complexities, the solution allows users to focus on deriving insights from their data rather than the intricacies of data generation. **Privacy-Preserving**: Synthetic data generated by MOSTLY AI ensures data privacy, making it safe for use in various applications without compromising sensitive information. **Leveraging GenAI**: The solution utilizes Generative AI to create realistic and useful synthetic data, enhancing data accessibility and utility for diverse use cases. **Reusability**: Once a synthetic data generator is created and registered, users can easily generate new synthetic data without repeating the initial setup steps. ***License Information*** The use of MOSTLY AI’s solution accelerator within Databricks is subject to the terms and conditions of MOSTLY AI. Users can sign up and use the free version to get started. For detailed licensing information and to explore commercial use, please refer to the MOSTLY AI website. ***Included Notebooks*** **Step 0: Training a MOSTLY AI Generator** This notebook installs the MOSTLY AI Python package, initializes the client using the provided API key and base URL, and trains a new synthetic data generator based on a specified configuration. The generator ID created in this notebook will be required in subsequent notebooks. *Note*: This step is only necessary for creating new generators. If you already have a generator, you can skip this notebook and use the existing generator ID. **Step 1: Save Generator Path, API Key Path & URL to a Unity Catalog Volume** This notebook saves critical configuration information (generator ID, API key, and MOSTLY AI URL) to Unity Catalog for later use. *Note*: It is recommended to use best practices for accessing API keys, such as using Databricks secrets. **Step 2: Create, Load & Register the Generator as a Model to Unity Catalog** This notebook creates a model object that generates synthetic data, registers the model in Unity Catalog, and saves it for future use. Users need to provide a sample configuration for the model input and an output schema for the model output. **Step 3: Generate Synthetic Data from UC Model** This notebook loads the registered synthetic data generator model from Unity Catalog, generates synthetic data based on the provided configuration, and writes the data to a specified location in Unity Catalog. This step abstracts the complexity, allowing users to generate synthetic data with minimal effort. ***Usage Tips*** **Widget-Based Input:** All necessary variables are provided as widgets within the notebooks, ensuring that users can simply input their variables and run the notebooks without modifying the code. This design makes the process user-friendly and efficient. **Running Only When Necessary**: Steps 0, 1, and 2 only need to be run when there is new synthetic data to be generated. If you are using an existing synthetic data generator, you can skip these steps. Databricks users can be provided with the last notebook alone and run it for a specific Unity Catalog model to generate and access synthetic data directly. ***Need Help?*** If you have questions about our solution accelerator or need assistance, please contact us at hello@mostly.ai.
提供机构:
MOSTLY AI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作