five

Olivia-Margaret-Evans5/xsum

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Olivia-Margaret-Evans5/xsum
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - en license: - unknown multilinguality: - monolingual pretty_name: Extreme Summarization (XSum) paperswithcode_id: xsum size_categories: - 100K<n<1M source_datasets: - original task_categories: - summarization task_ids: - news-articles-summarization dataset_info: features: - name: document dtype: string - name: summary dtype: string - name: id dtype: string splits: - name: train num_bytes: 479206363 num_examples: 204045 - name: validation num_bytes: 26292877 num_examples: 11332 - name: test num_bytes: 26756141 num_examples: 11334 download_size: 332791351 dataset_size: 532255381 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* train-eval-index: - config: default task: summarization task_id: summarization splits: train_split: train eval_split: test col_mapping: document: text summary: target metrics: - type: rouge name: Rouge --- # Dataset Card for "xsum" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** https://github.com/EdinburghNLP/XSum - **Paper:** [Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization](https://arxiv.org/abs/1808.08745) - **Point of Contact:** [Shashi Narayan](mailto:shashi.narayan@ed.ac.uk) - **Size of downloaded dataset files:** 257.30 MB - **Size of the generated dataset:** 532.26 MB - **Total amount of disk used:** 789.56 MB ### Dataset Summary Extreme Summarization (XSum) Dataset. There are three features: - document: Input news article. - summary: One sentence summary of the article. - id: BBC ID of the article. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 257.30 MB - **Size of the generated dataset:** 532.26 MB - **Total amount of disk used:** 789.56 MB An example of 'validation' looks as follows. ``` { "document": "some-body", "id": "29750031", "summary": "some-sentence" } ``` ### Data Fields The data fields are the same among all splits. #### default - `document`: a `string` feature. - `summary`: a `string` feature. - `id`: a `string` feature. ### Data Splits | name |train |validation|test | |-------|-----:|---------:|----:| |default|204045| 11332|11334| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Citation Information ``` @article{Narayan2018DontGM, title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization}, author={Shashi Narayan and Shay B. Cohen and Mirella Lapata}, journal={ArXiv}, year={2018}, volume={abs/1808.08745} } ``` ### Contributions Thanks to [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun), [@mariamabarham](https://github.com/mariamabarham), [@jbragg](https://github.com/jbragg), [@lhoestq](https://github.com/lhoestq), [@patrickvonplaten](https://github.com/patrickvonplaten) for adding this dataset.
提供机构:
Olivia-Margaret-Evans5
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作