five

ronantakizawa/jfleg-ja

收藏
Hugging Face2025-10-24 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/ronantakizawa/jfleg-ja
下载链接
链接失效反馈
官方服务:
资源简介:
JFLEG-JA是一个日语语法错误修正(GEC)数据集,由原始的JFLEG(JHU FLuency-Extended GUG)基准启发创建。该数据集包含1335个含有语法错误的日语句子,每个句子都伴有4个人类高质量修正版本,这些修正既关注语法正确性也关注流畅性。数据集涵盖多种日语语法错误类型,如助词错误、动词变形错误、语体错误、词汇选择错误、数量词错误、句子结构错误、连词错误、修饰语错误、助动词错误和条件句错误等。数据集是使用AI生成的句子和手动编写的替换句子创建的,并经过严格的质量检查,确保每个例子都有4个独特的修正版本,没有重复的句子或修正,且错误句子看起来自然和真实,修正后的句子听起来像母语者所说。

JFLEG-JA is a Japanese grammatical error correction (GEC) dataset inspired by the original JFLEG (JHU FLuency-Extended GUG) benchmark. The dataset contains 1,335 Japanese sentences with grammatical errors, each accompanied by 4 human-quality corrections focusing on both grammaticality and fluency. It covers various types of Japanese grammatical errors, including particle errors, verb conjugation errors, register/formality errors, word choice errors, counter word errors, sentence structure errors, conjunction errors, modifier errors, auxiliary verb errors, and conditional errors. The dataset was created using a combination of AI-generated sentences and manually written replacement sentences, and has undergone rigorous quality checking to ensure natural and realistic errors and native-sounding corrections.
提供机构:
ronantakizawa
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作