CEFR-SP

arXiv2025-09-30 收录

多语言学习

文本简化

数据链接：

https://github.com/yukiar/cefr-sp 数据链接链接失效反馈

官方服务：

资源简介：

该数据集名为CEFR-SP，包含了由专家标注的共17,000个句子，每个句子都被赋予了六个CEFR级别中的一个。值得注意的是，我们仅使用了公开可用的子集，排除了基于Newsela的数据。在训练中，我们使用了10,000个带有标签的句子。该数据集的任务是与ESL学习者熟练度对齐的句子简化。

The dataset is named CEFR-SP. It contains a total of 17,000 expert-annotated sentences, with each sentence assigned to one of the six CEFR proficiency levels. Notably, we only utilized the publicly available subset, excluding the Newsela-based data. For model training, we employed 10,000 labeled sentences. The task of this dataset is sentence simplification aligned with the proficiency levels of ESL learners.

CEFR-SP

资源简介：

相关数据集