five

ZhongshengWang/Alpaca-cnn-dailymail

收藏
Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ZhongshengWang/Alpaca-cnn-dailymail
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: - apache-2.0 multilinguality: - monolingual size_categories: - 100K<n<1M source_datasets: - original task_categories: - summarization - text-generation task_ids: [] paperswithcode_id: cnn-daily-mail-1 pretty_name: CNN / Daily Mail tags: - conditional-text-generation --- ## Data Summary Data set Alpaca-cnn-dailymail is a data set version format changed by [ccdv/cnn_dailymail](https://huggingface.co/datasets/ccdv/cnn_dailymail) to meet Alpaca fine-tuning Llama2. Only versions 3.0.0 and 2.0.0 were used for merging and as a key data set for the summary extraction task. ## Licensing Information The Alpaca-cnn-dailymail dataset version 1.0.0 is released under the Apache-2.0 License. ## Citation Information ``` @inproceedings{see-etal-2017-get, title = "Get To The Point: Summarization with Pointer-Generator Networks", author = "See, Abigail and Liu, Peter J. and Manning, Christopher D.", booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P17-1099", doi = "10.18653/v1/P17-1099", pages = "1073--1083", abstract = "Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. In this work we propose a novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways. First, we use a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Second, we use coverage to keep track of what has been summarized, which discourages repetition. We apply our model to the CNN / Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 2 ROUGE points.", } ``` ``` @inproceedings{DBLP:conf/nips/HermannKGEKSB15, author={Karl Moritz Hermann and Tomás Kociský and Edward Grefenstette and Lasse Espeholt and Will Kay and Mustafa Suleyman and Phil Blunsom}, title={Teaching Machines to Read and Comprehend}, year={2015}, cdate={1420070400000}, pages={1693-1701}, url={http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend}, booktitle={NIPS}, crossref={conf/nips/2015} } ```
提供机构:
ZhongshengWang
原始信息汇总

数据集概述

基本信息

  • 数据集名称: Alpaca-cnn-dailymail
  • 语言: 英语
  • 许可证: Apache-2.0
  • 多语言性: 单语种
  • 数据集大小: 100K<n<1M
  • 源数据集: 原始数据集
  • 任务类别: 摘要生成、文本生成
  • 标签: 条件文本生成

详细信息

  • 数据集版本: 仅使用版本3.0.0和2.0.0进行合并,用于摘要提取任务。
  • 数据集来源: 由ccdv/cnn_dailymail数据集版本格式更改而来,以适应Alpaca对Llama2的微调。

许可证信息

  • 许可证: Apache-2.0 License

引用信息

@inproceedings{see-etal-2017-get, title = "Get To The Point: Summarization with Pointer-Generator Networks", author = "See, Abigail and Liu, Peter J. and Manning, Christopher D.", booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P17-1099", doi = "10.18653/v1/P17-1099", pages = "1073--1083", abstract = "Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. In this work we propose a novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways. First, we use a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Second, we use coverage to keep track of what has been summarized, which discourages repetition. We apply our model to the CNN / Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 2 ROUGE points.", }

@inproceedings{DBLP:conf/nips/HermannKGEKSB15, author={Karl Moritz Hermann and Tomás Kociský and Edward Grefenstette and Lasse Espeholt and Will Kay and Mustafa Suleyman and Phil Blunsom}, title={Teaching Machines to Read and Comprehend}, year={2015}, cdate={1420070400000}, pages={1693-1701}, url={http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend}, booktitle={NIPS}, crossref={conf/nips/2015} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作