waboucay/turk_corpus

Name: waboucay/turk_corpus
Creator: waboucay
Published: 2024-04-04 09:37:50
License: 暂无描述

Hugging Face2024-04-04 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/waboucay/turk_corpus

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en task_categories: - text2text-generation --- # Turk Corpus  HuggingFace implementation of the Turk corpus for sentence simplification gathered by Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen and Chris Callison-Burch. /!\ I am not one of the creators of the dataset, I just needed a HF version of this dataset and uploaded it. I encourage you to read the paper introducing the dataset: [Optimizing Statistical Machine Translation for Text Simplification](https://aclanthology.org/Q16-1029/) (2016)      ## Uses This dataset can be used to evaluate sentence simplification models.      ## Dataset Structure  - **Size of the generated dataset:** 2.4 MB An example of 'test' looks as follows. ``` { 'complex': 'One side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed , a Sudanese militia group recruited mostly from the Afro-Arab Abbala tribes of the northern Rizeigat region in Sudan .', 'simple': [ 'One side of the armed conflicts is made of Sudanese military and the Janjaweed , a Sudanese militia recruited from the Afro-Arab Abbala tribes of the northern Rizeigat region in Sudan .', 'One side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed, a Sudanese militia group recruited mostly from the Afro-Arab Abbala tribes of the northern Rizeigat regime in Sudan.', 'One side of the armed conflicts is made up mostly of the Sudanese military and the Janjaweed, a Sudanese militia group whose recruits mostly come from the Afro-Arab Abbala tribes from the northern Rizeigat region in Sudan.', 'One side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed , a Sudanese militia group recruited mostly from the Afro-Arab Abbala tribes in Sudan .', 'One side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed , a Sudanese militia group recruited mostly from the Afro-Arab Abbala tribes of the northern Rizeigat region in Sudan .', 'One side of the armed conflicts consist of the Sudanese military and the Sudanese militia group Janjaweed.', 'The Sudanese military and the Janjaweed make up one of the armed conflicts, mostly from the Afro-Arab Abbal tribes in Sudan.', 'One side of the armed conflicts is mainly Sudanese military and the Janjaweed, which recruited from the Afro-Arab Abbala tribes.' ] } ```            ## Citation  **BibTeX:** ``` @article{xu-etal-2016-optimizing, title = "Optimizing Statistical Machine Translation for Text Simplification", author = "Xu, Wei and Napoles, Courtney and Pavlick, Ellie and Chen, Quanze and Callison-Burch, Chris", editor = "Lee, Lillian and Johnson, Mark and Toutanova, Kristina", journal = "Transactions of the Association for Computational Linguistics", volume = "4", year = "2016", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/Q16-1029", doi = "10.1162/tacl_a_00107", pages = "401--415", } ``` **ACL:** Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing Statistical Machine Translation for Text Simplification. Transactions of the Association for Computational Linguistics, 4:401–415.

提供机构：

waboucay

原始信息汇总

数据集概述

数据集名称

Turk Corpus

数据集简介

该数据集是HuggingFace实现的Turk corpus，用于句子简化任务。该数据集由Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen和Chris Callison-Burch收集。

数据集用途

该数据集主要用于评估句子简化模型。

数据集结构

数据集大小： 2.4 MB
数据示例： 包含一个复杂的句子及其多个简化版本。

引用信息

BibTeX

@article{xu-etal-2016-optimizing, title = "Optimizing Statistical Machine Translation for Text Simplification", author = "Xu, Wei and Napoles, Courtney and Pavlick, Ellie and Chen, Quanze and Callison-Burch, Chris", editor = "Lee, Lillian and Johnson, Mark and Toutanova, Kristina", journal = "Transactions of the Association for Computational Linguistics", volume = "4", year = "2016", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/Q16-1029", doi = "10.1162/tacl_a_00107", pages = "401--415", }

ACL

Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing Statistical Machine Translation for Text Simplification. Transactions of the Association for Computational Linguistics, 4:401–415.

5,000+

优质数据集

54 个

任务类型

进入经典数据集