MSRS
收藏魔搭社区2025-12-05 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/yale-nlp/MSRS
下载链接
链接失效反馈官方服务:
资源简介:
# MSRS: Evaluating Multi-Source Retrieval-Augmented Generation
**[📄 Paper](https://arxiv.org/abs/2508.20867) | [💻 Code](https://github.com/yale-nlp/MSRS)**
This paper introduces a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources and generate long-form responses. Using our framework, we build two new benchmarks on Multi-Source Retrieval and Synthesis: MSRS-Story and MSRS-Meet.
## 🚀 Quickstart
Load the corpora for MSRS-Story and MSRS-Meet:
```py
from datasets import load_dataset
story_corpus = load_dataset("yale-nlp/MSRS", "story-corpus", split="corpus")
meeting_corpus = load_dataset("yale-nlp/MSRS", "meeting-corpus", split="corpus")
```
Corpus Dataset Example:
```js
{
"id": // Unique ID for the document
"text": // Document text
}
```
Load the query-answer pairs for MSRS-Story and MSRS-Meet (available splits: `train`, `test`, and `validation`):
```py
from datasets import load_dataset
story_qa = load_dataset("yale-nlp/MSRS", "story-qa")
meeting_qa = load_dataset("yale-nlp/MSRS", "meeting-qa")
```
QA Dataset Example:
```js
{
"id": // Unique ID for the query
"query": // Query text
"gold_documents": // List of gold document IDs
"answer": // List of answer summaries
}
```
# MSRS:多源检索增强生成评测基准
**[📄 论文](https://arxiv.org/abs/2508.20867) | [💻 代码](https://github.com/yale-nlp/MSRS)**
本工作提出了一种可扩展的评测基准构建框架,该框架可对检索增强生成(Retrieval-Augmented Generation,RAG)系统提出挑战,要求其整合不同来源的信息并生成长格式回复。基于该框架,我们构建了两个面向多源检索与合成(Multi-Source Retrieval and Synthesis)的全新评测基准:MSRS-Story与MSRS-Meet。
## 🚀 快速上手
加载MSRS-Story与MSRS-Meet的语料库:
py
from datasets import load_dataset
story_corpus = load_dataset("yale-nlp/MSRS", "story-corpus", split="corpus")
meeting_corpus = load_dataset("yale-nlp/MSRS", "meeting-corpus", split="corpus")
语料库数据集示例:
js
{
"id": // 文档唯一标识符
"text": // 文档文本内容
}
加载MSRS-Story与MSRS-Meet的查询-问答对(支持的拆分子集包括`train`(训练集)、`test`(测试集)与`validation`(验证集)):
py
from datasets import load_dataset
story_qa = load_dataset("yale-nlp/MSRS", "story-qa")
meeting_qa = load_dataset("yale-nlp/MSRS", "meeting-qa")
问答数据集示例:
js
{
"id": // 查询唯一标识符
"query": // 查询文本
"gold_documents": // 关联标准答案文档ID列表
"answer": // 标准摘要回复列表
}
提供机构:
maas
创建时间:
2025-08-28
搜集汇总
数据集介绍

背景与挑战
背景概述
MSRS数据集旨在挑战RAG系统整合不同来源信息并生成长篇回答的能力,包含故事和会议两个场景的语料库和查询-答案对。
以上内容由遇见数据集搜集并总结生成



