JHU FLuency-Extended GUG corpus (JFLEG)

Name: JHU FLuency-Extended GUG corpus (JFLEG)
Creator: 语言与语音处理中心
Published: 2017-02-14 11:47:34
License: 暂无描述

arXiv2017-02-14 更新2024-06-21 收录

下载链接：

https://github.com/keisks/jfleg

下载链接

链接失效反馈

官方服务：

资源简介：

JFLEG数据集由语言与语音处理中心创建，专注于语法错误修正（GEC）的评估与开发。该数据集包含1511条句子，覆盖广泛的英语水平，并通过整体流畅性编辑不仅纠正语法错误，还使原文更接近母语表达。数据集的创建涉及对GUG语料库的扩展，每条句子收集了四个人工编写的修正，旨在提供多样化的编辑类型。JFLEG数据集的应用领域包括评估和改进GEC系统，解决现有数据集在流畅性和语法准确性评估上的不足。

The JFLEG Dataset was developed by the Center for Language and Speech Processing, with a primary focus on the evaluation and advancement of Grammatical Error Correction (GEC). This dataset consists of 1,511 sentences spanning a broad spectrum of English proficiency levels. Through holistic fluency editing, it not only corrects grammatical errors but also aligns the original texts closer to native-like expression. The creation of the JFLEG Dataset involves expanding the GUG Corpus, where four manually authored corrections are collected for each sentence, aiming to provide diverse editing types. Application scenarios of the JFLEG Dataset include evaluating and enhancing GEC systems, as well as addressing the shortcomings of existing datasets in assessing fluency and grammatical accuracy.

提供机构：

语言与语音处理中心

创建时间：

2017-02-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集