five

MRCEval

收藏
魔搭社区2025-10-09 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/THU-KEG/MRCEval
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Dataset Name <!-- Provide a quick summary of the dataset. --> MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark by Shengkun Ma, Hao Peng, Lei Hou and Juanzi Li. MRCEval is a comprehensive benchmark for machine reading comprehension (MRC) designed to assess the reading comprehension (RC) capabilities of LLMs, covering 13 sub-tasks with a total of 2.1K high-quality multi-choice questions. ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> [More Information Needed] ### Data Instances An example from facts_understanding subtask looks as follows: ``` { "index": 0 "category": "Facts_entity", "source": "squad", "context": "Super_Bowl_50 The Broncos took an early lead in Super Bowl 50 and never trailed. Newton was limited by Denver's defense, which sacked him seven times and forced him into three turnovers, including a fumble which they recovered for a touchdown. Denver linebacker Von Miller was named Super Bowl MVP, recording five solo tackles, 2½ sacks, and two forced fumbles." "question": "How many fumbles did Von Miller force?", "choices": ["two", "four", "three", "one"], "answer": "A" } ``` ### Data Fields - `index`: a number, index of the instance - `category`: a string, category of the instance - `source`: a string, source of the instance - `context`: a string - `question`: a string - `choices`: a list of 4 string features - `answer`: a ClassLabel feature

# 数据集卡片:MRCEval <!-- 请提供该数据集的简要概述。 --> MRCEval:由马胜坤、彭昊、侯磊与李娟子构建的全面、兼具挑战性与易用性的机器阅读理解基准数据集。 MRCEval是一款用于评估大语言模型(Large Language Model,LLM)阅读理解能力的综合性机器阅读理解(Machine Reading Comprehension,MRC)基准,涵盖13个子任务,总计2.1千道高质量单项选择题。 ## 数据集结构 <!-- 本节将对数据集字段展开说明,并补充数据集划分依据、数据点间关联关系等数据集结构相关信息。 --> [需补充更多信息] ### 数据样例 以下为事实理解(facts_understanding)子任务的一则样例: { "index": 0, "category": "事实实体", "source": "斯坦福问答数据集(SQuAD)", "context": "第50届超级碗 丹佛野马队早早取得领先并全程未被对手反超。牛顿受到丹佛防守组的限制,球队7次擒杀他,并迫使他出现3次失误,包括一次掉球,对手完成回攻达阵。丹佛线卫冯·米勒荣膺超级碗最有价值球员,他完成5次单独擒抱、2.5次擒杀以及2次迫使掉球。", "question": "冯·米勒迫使了多少次掉球?", "choices": ["两次", "四次", "三次", "一次"], "answer": "A" } ### 数据字段 - `index`:数值类型,代表数据样例的索引 - `category`:字符串类型,代表数据样例的类别 - `source`:字符串类型,代表数据样例的来源数据集 - `context`:字符串类型,即上下文文本 - `question`:字符串类型,即问题文本 - `choices`:包含4个字符串元素的列表,为候选选项集 - `answer`:ClassLabel类型特征
提供机构:
maas
创建时间:
2025-07-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作