CLOTH
收藏arXiv2025-09-30 收录
下载链接:
https://www.cs.cmu.edu/~glai1/data/cloth/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为CLOTH,是一个大型的由人类教师为中学和高中语言考试设计的人工闭卷测试数据集。与之前自动生成的闭卷测试数据集相比,它要求更深入的语言理解能力和更广泛的注意力跨度。该数据集分为CLOTH-M(面向中学生)和CLOTH-H(面向高中生)两部分,问题类型包括语法、短期推理、匹配/释义以及长期推理。在清理后,数据集包含了7,131个段落和99,433个问题。该数据集的任务是通过闭卷测试来评估语言熟练度。
This dataset, named CLOTH, is a large-scale manually curated closed-book test dataset designed by human teachers for middle and high school language proficiency examinations. Compared with previously released automatically generated closed-book test datasets, it requires more in-depth language comprehension abilities and a broader attention span. The dataset is divided into two subsets: CLOTH-M (for middle school students) and CLOTH-H (for high school students). The question types cover grammar, short-term reasoning, matching/paraphrasing, and long-term reasoning. After data cleaning, the dataset contains 7,131 passages and 99,433 questions. The core task of this dataset is to assess language proficiency through closed-book testing.
提供机构:
Collected from English teachers' resources



