five

tadeodonegana/samsum-es

收藏
Hugging Face2023-12-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tadeodonegana/samsum-es
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - translated language: - es license: - cc-by-nc-nd-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - samsum task_categories: - summarization task_ids: [] pretty_name: SAMSum Corpus (es) tags: - conversations-summarization dataset_info: features: - name: text dtype: string - name: target dtype: string splits: - name: train num_bytes: 10105743 num_examples: 14730 - name: validation num_bytes: 559296 num_examples: 818 - name: test num_bytes: 580074 num_examples: 819 download_size: 7111425 dataset_size: 11245113 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Dataset Card for SAMSum Corpus (es) ## Dataset Description Translated [samsum](https://huggingface.co/datasets/samsum) dataset to spanish language. ### Links - **Samsum original dataset repository:** https://huggingface.co/datasets/samsum - **Paper:** https://arxiv.org/abs/1911.12237v2 ### Languages Spanish (translated from English [samsum](https://huggingface.co/datasets/samsum) using GPT-3.5 Turbo) ## Dataset Structure ### Data Fields - text: text of dialogue. - target: human written summary of the dialogue. ### Data Splits - train: 14730 - validation: 818 - test: 819 ## Licensing Information non-commercial licence: CC BY-NC-ND 4.0 ## Citation Information ``` @inproceedings{gliwa-etal-2019-samsum, title = "{SAMS}um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization", author = "Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander", booktitle = "Proceedings of the 2nd Workshop on New Frontiers in Summarization", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-5409", doi = "10.18653/v1/D19-5409", pages = "70--79" } ```
提供机构:
tadeodonegana
原始信息汇总

数据集卡片 for SAMSum Corpus (es)

数据集描述

samsum数据集翻译成西班牙语。

语言

西班牙语(从英语samsum使用GPT-3.5 Turbo翻译)

数据集结构

数据字段

  • text: 对话文本。
  • target: 人工编写的对话摘要。

数据分割

  • train: 14730
  • validation: 818
  • test: 819

许可信息

非商业许可:CC BY-NC-ND 4.0

引用信息

@inproceedings{gliwa-etal-2019-samsum, title = "{SAMS}um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization", author = "Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander", booktitle = "Proceedings of the 2nd Workshop on New Frontiers in Summarization", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-5409", doi = "10.18653/v1/D19-5409", pages = "70--79" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作