five

SeacomSrl/rag-data

收藏
Hugging Face2024-05-15 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/SeacomSrl/rag-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - question-answering language: - it size_categories: - 1K<n<10K features: - name: context dtype: string - name: question dtype: string - name: answer dtype: string tags: - croissant --- # The following dataset is constantly improving, any suggestion/help is welcome. **Retrieval-Augmented Generation (RAG) Dataset** Retrieval-Augmented Generation (RAG) data is an Italian translated sub-dataset of [Neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) designed for RAG-optimized models, craft by [Seacom Srl](https://seacom.it/), and released under [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0.html). #### Languages The text in the dataset is in Italian. #### Data Instances A typical data point comprises a context, a question about the context, and an answer for the question. The context is obtained from [Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), and the question and answer for each data point are generated by GPT-4. An example from the dataset looks like the following: ``` { context: ... question: ... answer: ... } ``` #### Data Fields - `context`: A string consisting of a range of tokens. - `question`: A string consisting of a question related to the context. - `answer`: A string consisting of an answer for the question.

--- license: Apache许可证2.0 task_categories: - 问答 language: - 意大利语 size_categories: - 1K<n<10K features: - name: context dtype: 字符串 - name: question dtype: 字符串 - name: answer dtype: 字符串 tags: - croissant --- # 本数据集持续优化中,欢迎提出任何建议或帮助。 **检索增强生成(Retrieval-Augmented Generation,RAG)数据集** 检索增强生成(Retrieval-Augmented Generation,RAG)数据集是[Neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000)的意大利语翻译子数据集,专为RAG优化模型设计,由[Seacom Srl](https://seacom.it/)构建,并以[Apache许可证2.0](https://www.apache.org/licenses/LICENSE-2.0.html)发布。 #### 语言 数据集中的文本为意大利语。 #### 数据样本 典型数据样本包含一个上下文(context)、一个针对该上下文的问题(question)以及对应的答案(answer)。上下文来源于[Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb),每个数据样本的问题与答案由GPT-4生成。 数据集示例如下: { context: ... question: ... answer: ... } #### 数据字段 - `context`:由一系列Token组成的字符串。 - `question`:与上下文相关的问题字符串。 - `answer`:对应问题的答案字符串。
提供机构:
SeacomSrl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作