five

walzen/gigaword

收藏
Hugging Face2026-01-16 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/walzen/gigaword
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - summarization language: - en size_categories: - 1M<n<10M configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Gigaword (Repackaged) Putting it here for better accessibility. ## Description Headline-generation on a corpus of article pairs from Gigaword consisting of around 4 million articles. Use the 'org_data' provided by https://github.com/microsoft/unilm/ which is identical to https://github.com/harvardnlp/sent-summary but with better format. There are two features: - **document**: article. - **summary**: headline. **Homepage:** https://github.com/harvardnlp/sent-summary **Original Source code:** tfds.summarization.Gigaword ## Dataset Statistics - **Versions:** 1.2.0 (default) - **Download size:** 551.61 MiB - **Dataset size:** 1.02 GiB ### Splits | Split | Examples | | :--- | :--- | | 'test' | 1,951 | | 'train' | 3,803,957 | | 'validation' | 189,651 | ## Feature Structure ```python FeaturesDict({{ 'document': Text(shape=(), dtype=string), 'summary': Text(shape=(), dtype=string), }}) ``` Citation ``` @article{{graff2003english, title={{English gigaword}}, author={{Graff, David and Kong, Junbo and Chen, Ke and Maeda, Kazuaki}}, journal={{Linguistic Data Consortium, Philadelphia}}, volume={{4}}, number={{1}}, pages={{34}}, year={{2003}} }} @article{{Rush_2015, title={{A Neural Attention Model for Abstractive Sentence Summarization}}, url={{[http://dx.doi.org/10.18653/v1/D15-1044](http://dx.doi.org/10.18653/v1/D15-1044)}}, DOI={{10.18653/v1/d15-1044}}, journal={{Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing}}, publisher={{Association for Computational Linguistics}}, author={{Rush, Alexander M. and Chopra, Sumit and Weston, Jason}}, year={{2015}} }} Original TFDS Catalog: https://www.tensorflow.org/datasets/catalog/gigaword
提供机构:
walzen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作