SeacomSrl/rag-data
收藏Hugging Face2024-05-15 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/SeacomSrl/rag-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
language:
- it
size_categories:
- 1K<n<10K
features:
- name: context
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
tags:
- croissant
---
# The following dataset is constantly improving, any suggestion/help is welcome.
**Retrieval-Augmented Generation (RAG) Dataset**
Retrieval-Augmented Generation (RAG) data is an Italian translated sub-dataset of [Neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) designed for RAG-optimized models, craft by [Seacom Srl](https://seacom.it/), and released under [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0.html).
#### Languages
The text in the dataset is in Italian.
#### Data Instances
A typical data point comprises a context, a question about the context, and an answer for the question. The context is obtained from [Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), and the question and answer for each data point are generated by GPT-4.
An example from the dataset looks like the following:
```
{
context: ...
question: ...
answer: ...
}
```
#### Data Fields
- `context`: A string consisting of a range of tokens.
- `question`: A string consisting of a question related to the context.
- `answer`: A string consisting of an answer for the question.
---
license: Apache许可证2.0
task_categories:
- 问答
language:
- 意大利语
size_categories:
- 1K<n<10K
features:
- name: context
dtype: 字符串
- name: question
dtype: 字符串
- name: answer
dtype: 字符串
tags:
- croissant
---
# 本数据集持续优化中,欢迎提出任何建议或帮助。
**检索增强生成(Retrieval-Augmented Generation,RAG)数据集**
检索增强生成(Retrieval-Augmented Generation,RAG)数据集是[Neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000)的意大利语翻译子数据集,专为RAG优化模型设计,由[Seacom Srl](https://seacom.it/)构建,并以[Apache许可证2.0](https://www.apache.org/licenses/LICENSE-2.0.html)发布。
#### 语言
数据集中的文本为意大利语。
#### 数据样本
典型数据样本包含一个上下文(context)、一个针对该上下文的问题(question)以及对应的答案(answer)。上下文来源于[Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb),每个数据样本的问题与答案由GPT-4生成。
数据集示例如下:
{
context: ...
question: ...
answer: ...
}
#### 数据字段
- `context`:由一系列Token组成的字符串。
- `question`:与上下文相关的问题字符串。
- `answer`:对应问题的答案字符串。
提供机构:
SeacomSrl



