five

Quality Estimation Dataset for Japanese Grammatical Error Correction

收藏
arXiv2022-01-20 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2201.08038v1
下载链接
链接失效反馈
官方服务:
资源简介:
本研究构建的‘Quality Estimation Dataset for Japanese Grammatical Error Correction’数据集,由东京都立大学和理化学研究所联合开发。数据集包含4391条数据,源自Lang-8语料库的源文本和四个GEC系统的修正文本,每条数据均附有人工评估分数。创建过程中,通过四个代表性的GEC系统对日语学习者的文本进行修正,并由三名日本大学生进行人工评估。该数据集主要用于开发和验证日语语法错误修正的自动评估模型,旨在解决现有参考依赖评估方法的局限性,推动无参考自动评估技术的发展。

The 'Quality Estimation Dataset for Japanese Grammatical Error Correction' dataset, constructed in this study, was jointly developed by Tokyo Metropolitan University and RIKEN. It consists of 4,391 entries sourced from the source texts in the Lang-8 corpus and the corrected outputs of four grammatical error correction (GEC) systems, with each entry accompanied by manual evaluation scores. During its creation, four representative GEC systems were used to correct texts written by Japanese language learners, and the corrections were manually evaluated by three Japanese undergraduate students. This dataset is mainly used for developing and validating automatic evaluation models for Japanese grammatical error correction, aiming to address the limitations of existing reference-dependent evaluation methods and promote the development of reference-free automatic evaluation technologies.
提供机构:
东京都立大学
创建时间:
2022-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作