SCDE
收藏arXiv2020-04-28 更新2024-06-21 收录
下载链接:
http://www.21cnjy.com/; http://5utk.ks5u.com/; http://zujuan.xkw.com/; https://www.gzenxx.com/Html/rw/
下载链接
链接失效反馈官方服务:
资源简介:
SCDE数据集是由卡内基梅隆大学语言技术研究所创建,包含约6000篇文本和近30000个填空题,旨在通过句子级填空任务评估计算模型的性能。数据集源自公共学校的英语考试,每个题目包含一个带有多个句子级空白的段落和一个共享的候选答案集,其中包含由英语教师设计的干扰项。SCDE数据集的特点是需要使用超出单一句子邻域的非局部语篇级上下文来解决问题,且干扰项质量高,增加了任务的挑战性。该数据集主要用于推动更高级的语言理解模型的开发。
The SCDE dataset was developed by the Language Technologies Institute at Carnegie Mellon University. It contains approximately 6,000 texts and nearly 30,000 fill-in-the-blank questions, and is designed to evaluate the performance of computational models through sentence-level cloze tasks. The dataset is sourced from English exams administered in public schools. Each question includes a paragraph with multiple sentence-level blanks and a shared candidate answer set, which features distractors crafted by English teachers. A defining trait of the SCDE dataset is that solving these problems requires non-local, discourse-level context that extends beyond the immediate sentence neighborhood, while the high-quality distractors further elevate the task’s difficulty. This dataset is primarily intended to drive the development of more advanced language understanding models.
提供机构:
卡内基梅隆大学语言技术研究所
创建时间:
2020-04-28



