Who-did-What (Who did What)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Who-did-What
下载链接
链接失效反馈官方服务:
资源简介:
我们构建了一个新的 “谁做了什么” 数据集,其中包含从LDC英语Gigaword newswire语料库构建的超过200,000个填空 (完形填空) 多项选择阅读理解问题。WDW数据集具有多种新颖功能。首先,与CNN和Daily Mail数据集 (Hermann等,2015) 相反,我们避免使用文章摘要进行问题形成。取而代之的是,每个问题都是由两个独立的文章形成的-作为要阅读的文章给出的文章以及关于用于形成问题的相同事件的单独文章。其次,我们避免匿名化 --- 每个选择都是一个名为实体的人。第三,问题已经被过滤以去除简单基线容易解决的部分,同时保持人类可解决的84%。我们报告标准系统的性能基准,并提出WDW数据集作为社区的一项挑战任务。(这里的文章)
We constructed a novel "Who Did What" (WDW) dataset, which contains over 200,000 cloze-style multiple-choice reading comprehension questions built from the LDC English Gigaword newswire corpus. The WDW dataset boasts several novel features. First, unlike the CNN and Daily Mail datasets (Hermann et al., 2015), we refrain from using article summaries for question generation. Instead, each question is formulated using two separate articles: one provided as the reading passage, and a distinct article covering the same event that is utilized to construct the question. Second, we avoid anonymization: each option is a named human entity. Third, the questions have been filtered to remove items that can be easily solved by simple baseline models, while retaining 84% of the questions that are solvable by humans. We report performance benchmarks for standard systems and propose the WDW dataset as a challenging task for the research community. (this article)
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍

背景与挑战
背景概述
Who-did-What是一个基于LDC英语Gigaword语料库构建的阅读理解数据集,包含20余万个完形填空式多项选择题,通过两篇独立文章生成问题并保留实体名称,旨在为社区提供一项挑战性任务。
以上内容由遇见数据集搜集并总结生成



