Vikhrmodels/Grounded-RAG-RU-v2
收藏Hugging Face2024-12-14 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/Vikhrmodels/Grounded-RAG-RU-v2
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于训练语言模型回答关于文档问题的能力的数据集,基于俄罗斯维基百科的13,000篇文章,通过合成问题和答案(使用gpt-4-turbo-1106模型)构建而成。数据集包含4,047个唯一簇,每个簇代表一组文档,模拟了检索系统中的找到的结果。总共有50,210个唯一的对话。对话以HuggingFace格式呈现,包含文档、用户和助手的角色。数据集的目标是训练模型能够回答基于1到5个不同格式文档的简单和复杂问题,并学会拒绝那些在找到的文档中没有答案的问题。此外,模型还会生成一个包含相关信息的文档选择的单独回复,以便更好地控制和监控模型。
This is a dataset for training language models to answer questions about documents, based on 13,000 articles from the Russian Wikipedia, constructed with synthetic questions and answers (using the gpt-4-turbo-1106 model). The dataset contains 4,047 unique clusters, each representing a group of documents, simulating found results in a retrieval system. There are a total of 50,210 unique conversations. Conversations are presented in the HuggingFace format, including roles for documents, users, and assistants. The goal of the dataset is to train models to answer simple and complex questions based on 1 to 5 documents of different formats and to learn to reject questions that do not have answers in the found documents. In addition, the model generates a separate response containing the selection of documents with relevant information, allowing for better control and monitoring of the model.
提供机构:
Vikhrmodels



