SeacomSrl/rag-data

Name: SeacomSrl/rag-data
Creator: SeacomSrl
Published: 2024-05-15 12:05:12
License: 暂无描述

Hugging Face2024-05-15 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/SeacomSrl/rag-data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering language: - it size_categories: - 1K<n<10K features: - name: context dtype: string - name: question dtype: string - name: answer dtype: string tags: - croissant --- # The following dataset is constantly improving, any suggestion/help is welcome. **Retrieval-Augmented Generation (RAG) Dataset** Retrieval-Augmented Generation (RAG) data is an Italian translated sub-dataset of [Neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) designed for RAG-optimized models, craft by [Seacom Srl](https://seacom.it/), and released under [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0.html). #### Languages The text in the dataset is in Italian. #### Data Instances A typical data point comprises a context, a question about the context, and an answer for the question. The context is obtained from [Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), and the question and answer for each data point are generated by GPT-4. An example from the dataset looks like the following: ``` { context: ... question: ... answer: ... } ``` #### Data Fields - `context`: A string consisting of a range of tokens. - `question`: A string consisting of a question related to the context. - `answer`: A string consisting of an answer for the question.

--- license: Apache许可证2.0 task_categories: - 问答 language: - 意大利语 size_categories: - 1K<n<10K features: - name: context dtype: 字符串 - name: question dtype: 字符串 - name: answer dtype: 字符串 tags: - croissant --- # 本数据集持续优化中，欢迎提出任何建议或帮助。 **检索增强生成（Retrieval-Augmented Generation，RAG）数据集** 检索增强生成（Retrieval-Augmented Generation，RAG）数据集是[Neural-bridge/rag-dataset-12000]（https://huggingface.co/datasets/neural-bridge/rag-dataset-12000）的意大利语翻译子数据集，专为RAG优化模型设计，由[Seacom Srl]（https://seacom.it/）构建，并以[Apache许可证2.0]（https://www.apache.org/licenses/LICENSE-2.0.html）发布。 #### 语言数据集中的文本为意大利语。 #### 数据样本典型数据样本包含一个上下文（context）、一个针对该上下文的问题（question）以及对应的答案（answer）。上下文来源于[Falcon RefinedWeb]（https://huggingface.co/datasets/tiiuae/falcon-refinedweb），每个数据样本的问题与答案由GPT-4生成。数据集示例如下： { context: ... question: ... answer: ... } #### 数据字段 - `context`：由一系列Token组成的字符串。 - `question`：与上下文相关的问题字符串。 - `answer`：对应问题的答案字符串。

提供机构：

SeacomSrl

5,000+

优质数据集

54 个

任务类型

进入经典数据集