CEFR-SP
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/yukiar/cefr-sp
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为CEFR-SP,包含了由专家标注的共17,000个句子,每个句子都被赋予了六个CEFR级别中的一个。值得注意的是,我们仅使用了公开可用的子集,排除了基于Newsela的数据。在训练中,我们使用了10,000个带有标签的句子。该数据集的任务是与ESL学习者熟练度对齐的句子简化。
The dataset is named CEFR-SP. It contains a total of 17,000 expert-annotated sentences, with each sentence assigned to one of the six CEFR proficiency levels. Notably, we only utilized the publicly available subset, excluding the Newsela-based data. For model training, we employed 10,000 labeled sentences. The task of this dataset is sentence simplification aligned with the proficiency levels of ESL learners.



