MuCGEC (Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/MuCGEC
下载链接
链接失效反馈官方服务:
资源简介:
MuCGEC 是一个用于汉语语法纠错 (CGEC) 的多参考多源评估数据集,由从三个不同的汉语作为第二语言 (CSL) 学习器来源收集的 7,063 个句子组成。每个句子都由三位注释者更正,他们的更正由专家仔细审查,每个句子有 2.3 个参考文献。
MuCGEC is a multi-reference and multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC). It consists of 7,063 sentences collected from three distinct sources of Chinese as a Second Language (CSL) learners. Each sentence was corrected by three annotators, whose corrections were carefully reviewed by experts, resulting in an average of 2.3 reference correction versions per sentence.
提供机构:
OpenDataLab
创建时间:
2022-09-01
搜集汇总
数据集介绍

背景与挑战
背景概述
MuCGEC是一个专为汉语语法纠错设计的评估数据集,包含7,063个从多个汉语学习来源收集的句子,每个句子都经过三位注释者更正和专家审查,平均提供2.3个参考更正。该数据集由苏州大学和阿里巴巴于2022年发布,用于支持多参考多源评估。
以上内容由遇见数据集搜集并总结生成



