Unstructured Data Ingestion by Tonic Textual for Databricks

Name: Unstructured Data Ingestion by Tonic Textual for Databricks
Creator: Tonic.ai
License: 暂无描述

Databricks2025-03-14 收录

下载链接：

https://marketplace.databricks.com/details/72623feb-2807-483e-a486-59ae91bc37dc/Tonic-ai_Unstructured-Data-Ingestion-by-Tonic-Textual-for-Databricks

下载链接

链接失效反馈

官方服务：

资源简介：

**Overview** Tonic Textual is a powerful tool for preparing unstructured data—such as PDFs, Word docs, and more—for use in AI systems. It automates the conversion of these diverse file formats into a standardized structure and enriches them with metadata to improve the accuracy of AI responses. You can learn more about Tonic Textual in our awesome docs. The Mosaic AI Agent Framework, a feature of Databricks, allows users to efficiently build, deploy, and scale RAG applications by integrating vector search capabilities. One of the biggest challenges in building RAG systems is the ability to access and accurately retrieve relevant data from large, unstructured datasets. Tonic Textual addresses this issue by enriching data with metadata that boosts retrieval precision, leading to more accurate, trustworthy outputs. In this accelerator, we address common RAG challenges, such as data preparation, storage, and retrieval, and provide a solution that integrates Tonic Textual with the Databricks platform to streamline the entire process. **Description** This marketplace listing contains two notebooks. The first notebook (“Tonic_Rag”) contains code which connects to Tonic Textual and ingests your unstructured data into Databricks Vector Search. The second notebook (“Tonic_Chain”) contains the LangChain code that queries Tonic Textual for entities in a user’s question and then filters only for documents relevant to their question. **Benefits** - Enhanced accuracy: By filtering for relevant documents, Tonic Textual increases the likelihood that users receive precise and meaningful results from your RAG system. - Automated data ingestion: Tonic Textual seamlessly converts your documents into a RAG-friendly format, automatically transforming various file types into markdown—no manual conversions required. - Privacy protection: Safeguard sensitive information with Tonic Textual’s built-in data redaction, helping prevent any potential data leakage in your RAG documents. - Streamlined management: Consolidate and manage documents across Databricks and S3 within a single, easily accessible interface for efficient ingestion. **Included Notebooks** - Tonic_RAG: Integrates Tonic Textual with Databricks Vector Search, enabling ingestion of unstructured data. - Tonic_Chain: LangChain code that queries Tonic Textual to identify entities in user questions and filter documents specifically relevant to each inquiry. **Getting Started** Sign up for a free Tonic Textual account, then import our sample notebooks directly into your Databricks workspace. To request a live demo, click "Get Access" or email us at partnerships‑databricks@tonic.ai. **Requirements** A Tonic Textual account is required for this solution accelerator. After signing up for a free account, you can generate an API key that can be stored as a Databricks secret for use with this solution accelerator. Tonic Textual comes with a generous free trial of 100,000 words so you can test this solution accelerator thoroughly without paying. **License Information** Usage of this solution accelerator within Databricks is subject to [the terms and conditions ](https://uploads-ssl.webflow.com/62e28cf08913e81176ba2c39/65c2803dc01efc98df6a6b08_Tonic.ai%20Terms%20and%20Condition%20-%2020240205.pdf) of Tonic AI, Inc.

提供机构：

Tonic.ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集