MuCGEC
收藏arXiv2022-05-04 更新2024-06-21 收录
下载链接:
https://github.com/HillZhang1999/MuCGEC
下载链接
链接失效反馈官方服务:
资源简介:
MuCGEC是由苏州大学人工智能研究院和阿里巴巴达摩院合作创建的中文语法错误修正评估数据集,包含7063个来自三个不同中文作为第二语言学习者来源的句子。每个句子由三位注释者独立修正,并由一位资深注释者进行最终审查,平均每个句子有2.3个参考修正。数据集旨在支持中文语法错误修正(CGEC)的研究,通过提供多参考和多来源的数据,增强模型的评估可靠性。此外,数据集还探讨了CGEC评估方法,包括多参考的影响和基于字符的度量方法。
MuCGEC is a Chinese grammatical error correction (CGEC) evaluation dataset co-developed by the Institute of Artificial Intelligence of Soochow University and Alibaba DAMO Academy. It contains 7,063 sentences sourced from three distinct groups of Chinese as a Second Language (CSL) learners. Each sentence was independently corrected by three annotators, and received final review by a senior annotator, with an average of 2.3 reference corrections per sentence. This dataset is designed to support research on Chinese grammatical error correction (CGEC), enhancing the reliability of model evaluation by providing multi-reference and multi-source data. Furthermore, the dataset investigates CGEC evaluation methodologies, including the impact of multi-reference settings and character-based evaluation metrics.
提供机构:
苏州大学人工智能研究院
创建时间:
2022-04-23



