waboucay/wikilarge

Name: waboucay/wikilarge
Creator: waboucay
Published: 2024-04-02 15:08:19
License: 暂无描述

Hugging Face2024-04-02 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/waboucay/wikilarge

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en task_categories: - text2text-generation --- # WikiLarge  HuggingFace implementation of the WikiLarge corpus for sentence simplification gathered by Zhang, Xingxing and Lapata, Mirella. /!\ I am not one of the creators of the dataset, I just needed a HF version of this dataset and uploaded it. I encourage you to read the paper introducing the dataset: [Sentence Simplification with Deep Reinforcement Learning](https://aclanthology.org/D17-1062) (Zhang & Lapata, EMNLP 2017)      ## Uses This dataset can be used to train sentence simplification models.      ## Dataset Structure  - **Size of the generated dataset:** 69.3 MB An example of 'train' looks as follows. ``` { 'complex': 'Sensing of both the external and internal environments at the cellular level relies on signal transduction . Many disease processes , such as diabetes , heart disease , autoimmunity , and cancer arise from defects in signal transduction pathways , further highlighting the critical importance of signal transduction to biology , as well as medicine .', 'simple': 'A signal transduction in biology , is a cellular mechanism .' } ```            ## Citation  **BibTeX:** ``` @InProceedings{D17-1063, author = "Zhang, Xingxing and Lapata, Mirella", title = "Sentence Simplification with Deep Reinforcement Learning", booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing", year = "2017", publisher = "Association for Computational Linguistics", pages = "595--605", location = "Copenhagen, Denmark", url = "http://aclweb.org/anthology/D17-1063" } ``` **ACL:** Xingxing Zhang and Mirella Lapata. 2017. Sentence Simplification with Deep Reinforcement Learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 584–594, Copenhagen, Denmark. Association for Computational Linguistics.

--- 语言： - 英语任务类别： - 文本到文本生成 --- # WikiLarge  本数据集为由张星星（Xingxing Zhang）与米雷拉·拉帕塔（Mirella Lapata）收集的、面向句子简化任务的WikiLarge语料库的Hugging Face实现版本。 /! 本人并非该数据集的原创作者，仅因个人需求该数据集的Hugging Face版本而进行上传。诚挚推荐读者阅读介绍该数据集的论文：《基于深度强化学习的句子简化》（Sentence Simplification with Deep Reinforcement Learning），作者为张星星与拉帕塔，发表于2017年自然语言处理经验方法会议（EMNLP 2017），论文链接：https://aclanthology.org/D17-1062      ## 数据集用途本数据集可用于训练句子简化模型。      ## 数据集结构  - **生成数据集大小：** 69.3 MB 训练集的单条样本示例如下： { "complex": "细胞水平上对外部与内部环境的感知依赖于信号转导通路。诸多疾病进程，如糖尿病、心脏病、自身免疫病以及癌症，均源于信号转导通路的缺陷，这进一步凸显了信号转导在生物学与医学中的关键重要性。", "simple": "信号转导是生物学中的一种细胞机制。" }            ## 引用信息  **BibTeX:** @InProceedings{D17-1063, author = "Zhang, Xingxing and Lapata, Mirella", title = "Sentence Simplification with Deep Reinforcement Learning", booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing", year = "2017", publisher = "Association for Computational Linguistics", pages = "595--605", location = "Copenhagen, Denmark", url = "http://aclweb.org/anthology/D17-1063" } **ACL:** 张星星与米雷拉·拉帕塔。2017。基于深度强化学习的句子简化。见：2017年自然语言处理经验方法会议论文集，第584–594页，丹麦哥本哈根。计算语言学协会。  <!-- [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

waboucay

原始信息汇总

数据集概述

名称: WikiLarge

描述: WikiLarge是一个用于句子简化的语料库，由Zhang, Xingxing和Lapata, Mirella收集。此数据集是HuggingFace的实现版本，主要用于训练句子简化模型。

语言: 英语

数据集大小: 69.3 MB

数据集结构

数据示例:

{ complex: Sensing of both the external and internal environments at the cellular level relies on signal transduction . Many disease processes , such as diabetes , heart disease , autoimmunity , and cancer arise from defects in signal transduction pathways , further highlighting the critical importance of signal transduction to biology , as well as medicine ., simple: A signal transduction in biology , is a cellular mechanism . }

使用场景

用途: 用于训练句子简化模型。

引用信息

BibTeX:

@InProceedings{D17-1063, author = "Zhang, Xingxing and Lapata, Mirella", title = "Sentence Simplification with Deep Reinforcement Learning", booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing", year = "2017", publisher = "Association for Computational Linguistics", pages = "595--605", location = "Copenhagen, Denmark", url = "http://aclweb.org/anthology/D17-1063" }
ACL: Xingxing Zhang and Mirella Lapata. 2017. Sentence Simplification with Deep Reinforcement Learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 584–594, Copenhagen, Denmark. Association for Computational Linguistics.

搜集汇总

数据集介绍

背景与挑战

背景概述

WikiLarge是一个句子简化数据集，包含复杂句子及其简化版本，适用于训练句子简化模型。数据集大小为69.3 MB，由Zhang和Lapata在2017年提出。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集