five

tmu-nlp/tmu_gfm_dataset

收藏
Hugging Face2024-01-18 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/tmu-nlp/tmu_gfm_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - machine-generated language: - en license: - unknown multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text2text-generation task_ids: [] paperswithcode_id: null pretty_name: TMU-GFM-Dataset tags: - grammatical-error-correction dataset_info: features: - name: source dtype: string - name: output dtype: string - name: grammer sequence: int32 - name: fluency sequence: int32 - name: meaning sequence: int32 - name: system dtype: string - name: ave_g dtype: float32 - name: ave_f dtype: float32 - name: ave_m dtype: float32 splits: - name: train num_bytes: 1446144 num_examples: 4221 download_size: 1270197 dataset_size: 1446144 --- # Dataset Card for TMU-GFM-Dataset ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [N/A] - **Repository:** https://github.com/tmu-nlp/TMU-GFM-Dataset - **Paper:** [SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction](https://www.aclweb.org/anthology/2020.coling-main.573.pdf) - **Leaderboard:** [N/A] - **Point of Contact:** Check the paper. ### Dataset Summary Authors collected manual evaluations for the grammaticality, fluency, and meaning preservation of the system outputs of 1,381 sentences from CoNLL 2013. To collect the manual evaluations for various system outputs, each source sentence was corrected by the following five typical systems: statistical machine translation (SMT) (Grundkiewicz and Junczys-Dowmunt, 2018), recurrent neural network (RNN) (Luong et al., 2015), convolutional neural network (CNN) (Chollampatt and Ng, 2018), self-attention network (SAN) (Vaswani et al., 2017), and SAN with copy mechanism (SAN+Copy) (Zhao et al., 2019). Manual evaluation for the grammaticality, fluency, and meaning preservation were assigned to a total of 4,223 sentences. ### Supported Tasks and Leaderboards Grammatical Error Correction ### Languages English ## Dataset Structure ### Data Instances An example from the TMU-GFM-Dataset looks as follows: ``` {'ave_f': 3.4000000953674316, 'ave_g': 3.4000000953674316, 'ave_m': 3.5999999046325684, 'fluency': [3, 4, 3, 4, 3], 'grammer': [3, 4, 3, 4, 3], 'meaning': [3, 4, 4, 4, 3], 'output': 'After all, there will be an endless battle between the technology and human mentality.', 'source': 'Afterall there will be an endless battle between the technology and human mentality.', 'system': 'lstm,cnn'} ``` ### Data Fields The are 9 columns in the tmu-gfm-dataset. - source: source sentence. - output: system output sentence. - grammer: Grammaticaliry annotations by 5 annotators. - fluency: Fluency annotations by 5 annotators. - meaning: Meaning Preservation annotations by 5 annotators. - system: Which system the output sentence is from. - ave_g: Average grammer score. - ave_f: Average fluency score. - ave_m: Average meaning score. ### Data Splits Authors divided the dataset into train/dev/test with 3,376/422/423 sentences and used for fine-tuning BERT in thier paper. ## Dataset Creation ### Curation Rationale The authors proposed a reference-less metric trained on manual evaluations of system outputs for grammatical error correction (GEC). They said that previous studies have shown that reference-less metrics are promising; however, existing metrics are not optimized for manual evaluation of the system output because there is no dataset of system output with manual evaluation. To achieve a better correlation with manual evaluation, they created a dataset to optimize each sub-metric to the manual evaluation of GEC systems. Their annotators evaluated the output of five typical GEC systems. ### Source Data #### Initial Data Collection and Normalization Authors collected manual evaluations for the grammaticality, fluency, and meaning preservation of the system outputs of 1,381 sentences from CoNLL 2013. To collect the manual evaluations for various system outputs, each source sentence was corrected by the following five typical systems: statistical machine translation (SMT) (Grundkiewicz and Junczys-Dowmunt, 2018), recurrent neural network (RNN) (Luong et al., 2015), convolutional neural network (CNN) (Chollampatt and Ng, 2018), self-attention network (SAN) (Vaswani et al., 2017), and SAN with copy mechanism (SAN+Copy) (Zhao et al., 2019). #### Who are the source language producers? machine-generated ### Annotations #### Annotation process By excluding duplicate corrected sentences, manual evaluation for the grammaticality, fluency, and meaning preservation were assigned to a total of 4,223 sentences, as follows: - Grammaticality: Annotators evaluated the grammatical correctness of the system output. The authors followed the five-point scale evaluation criteria (4: Perfect, 3: Comprehensible, 2: Somewhat comprehensible, 1: Incomprehensible, and 0: Other) proposed by Heilman et al. (2014). - Fluency: Annotators evaluated how natural the sentence sounds for native speakers. The authors followed the criteria (4: Extremely natural, 3: Somewhat natural, 2: Somewhat unnatural, and 1: Extremely unnatural) proposed by Lau et al. (2015). - Meaning preservation: Annotators evaluated the extent to which the meaning of source sentences is preserved in system output. The authors followed the criteria (4: Identical, 3: Minor differences, 2: Moderate differences, 1: Sub- stantially different, and 0: Other) proposed by Xu et al. (2016). Finally, the authors created a dataset with manual evaluations for a total of 4,221 sentences, excluding sentences in which three or more annotators answered “0: Other.” #### Who are the annotators? Five native English annotators reqruited by using Amazon Mechaincal turk ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information @inproceedings{yoshimura-etal-2020-reference, title = "{SOME}: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction", author = "Yoshimura, Ryoma and Kaneko, Masahiro and Kajiwara, Tomoyuki and Komachi, Mamoru", booktitle = "Proceedings of the 28th International Conference on Computational Linguistics", month = dec, year = "2020", address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.coling-main.573", pages = "6516--6522", abstract = "We propose a reference-less metric trained on manual evaluations of system outputs for grammatical error correction (GEC). Previous studies have shown that reference-less metrics are promising; however, existing metrics are not optimized for manual evaluations of the system outputs because no dataset of the system output exists with manual evaluation. This study manually evaluates outputs of GEC systems to optimize the metrics. Experimental results show that the proposed metric improves correlation with the manual evaluation in both system- and sentence-level meta-evaluation. Our dataset and metric will be made publicly available.", } ### Contributions Thanks to [@forest1988](https://github.com/forest1988) for adding this dataset.
提供机构:
tmu-nlp
原始信息汇总

TMU-GFM-Dataset 数据集概述

数据集描述

数据集摘要

作者收集了来自CoNLL 2013的1,381个句子的系统输出的语法性、流畅性和意义保留的手动评估。每个源句子由以下五种典型系统进行了修正:统计机器翻译(SMT)、循环神经网络(RNN)、卷积神经网络(CNN)、自注意力网络(SAN)和带有复制机制的SAN(SAN+Copy)。手动评估共分配给4,223个句子。

支持的任务和排行榜

语法错误修正

语言

英语

数据集结构

数据实例

一个来自TMU-GFM-Dataset的示例如下:

json { "ave_f": 3.4000000953674316, "ave_g": 3.4000000953674316, "ave_m": 3.5999999046325684, "fluency": [3, 4, 3, 4, 3], "grammer": [3, 4, 3, 4, 3], "meaning": [3, 4, 4, 4, 3], "output": "After all, there will be an endless battle between the technology and human mentality.", "source": "Afterall there will be an endless battle between the technology and human mentality.", "system": "lstm,cnn" }

数据字段

数据集包含9列:

  • source:源句子。
  • output:系统输出句子。
  • grammer:5个标注者的语法性标注。
  • fluency:5个标注者的流畅性标注。
  • meaning:5个标注者的意义保留标注。
  • system:输出句子来自的系统。
  • ave_g:平均语法分数。
  • ave_f:平均流畅性分数。
  • ave_m:平均意义分数。

数据分割

作者将数据集分为训练/开发/测试集,分别为3,376/422/423个句子,并在论文中用于微调BERT。

数据集创建

策划理由

作者提出了一种基于手动评估系统输出的无参考指标,用于语法错误修正(GEC)。他们表示,尽管先前的研究表明无参考指标是有前景的,但现有的指标并未针对系统输出的手动评估进行优化,因为没有包含手动评估的系统输出数据集。为了更好地与手动评估相关联,他们创建了一个数据集,以优化每个子指标与GEC系统的手动评估。他们的标注者评估了五种典型GEC系统的输出。

源数据

初始数据收集和规范化

作者收集了来自CoNLL 2013的1,381个句子的系统输出的语法性、流畅性和意义保留的手动评估。每个源句子由以下五种典型系统进行了修正:统计机器翻译(SMT)、循环神经网络(RNN)、卷积神经网络(CNN)、自注意力网络(SAN)和带有复制机制的SAN(SAN+Copy)。

源语言生产者

机器生成

标注

标注过程

通过排除重复的修正句子,手动评估共分配给4,223个句子,具体如下:

  • 语法性:标注者评估系统输出的语法正确性。作者遵循了Heilman等人(2014)提出的五点量表评估标准(4:完美,3:可理解,2:有些可理解,1:不可理解,0:其他)。
  • 流畅性:标注者评估句子对母语者来说听起来有多自然。作者遵循了Lau等人(2015)提出的标准(4:极其自然,3:有些自然,2:有些不自然,1:极其不自然)。
  • 意义保留:标注者评估源句子在系统输出中保留意义的程度。作者遵循了Xu等人(2016)提出的标准(4:相同,3:小差异,2:中等差异,1:显著不同,0:其他)。

最终,作者创建了一个包含4,221个句子的数据集,排除了三个或更多标注者回答“0:其他”的句子。

标注者

通过使用Amazon Mechanical Turk招募的五名母语为英语的标注者。

使用数据的注意事项

数据集的社会影响

[更多信息需要]

偏见的讨论

[更多信息需要]

其他已知限制

[更多信息需要]

附加信息

数据集策展人

[更多信息需要]

许可信息

[更多信息需要]

引用信息

bibtex @inproceedings{yoshimura-etal-2020-reference, title = "{SOME}: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction", author = "Yoshimura, Ryoma and Kaneko, Masahiro and Kajiwara, Tomoyuki and Komachi, Mamoru", booktitle = "Proceedings of the 28th International Conference on Computational Linguistics", month = dec, year = "2020", address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.coling-main.573", pages = "6516--6522", abstract = "We propose a reference-less metric trained on manual evaluations of system outputs for grammatical error correction (GEC). Previous studies have shown that reference-less metrics are promising; however, existing metrics are not optimized for manual evaluations of the system outputs because no dataset of the system output exists with manual evaluation. This study manually evaluates outputs of GEC systems to optimize the metrics. Experimental results show that the proposed metric improves correlation with the manual evaluation in both system- and sentence-level meta-evaluation. Our dataset and metric will be made publicly available.", }

贡献

感谢@forest1988添加此数据集。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作