Gabriel/wiki_lingua_swe
收藏Hugging Face2022-10-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Gabriel/wiki_lingua_swe
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- sv
license:
- cc-by-sa-3.0
size_categories:
- 10K<n<100K
source_datasets:
- https://github.com/morningmoni/CiteSu
task_categories:
- summarization
- text2text-generation
task_ids: []
tags:
- conditional-text-generation
---
# Dataset Card for Swedish Wiki_lingua Dataset
The Swedish wiki_lingua dataset has only been machine-translated to improve downstream fine-tuning on Swedish summarization tasks.
## Dataset Summary
Read about the full details at original Multilingual version: https://huggingface.co/datasets/wiki_lingua
### Data details
- gem_id: the id for the data instance.
- gem_id_parent: the id for the data instance.
- Document: a string containing the document body.
- Summary: a string containing the summary of the body.
### Data Splits
The Swedish wiki_lingua dataset follows the same splits as the original English version and has 3 splits: _train_, _validation_, and _test_.
| Dataset Split | Number of Instances in Split |
| ------------- | ------------------------------------------- |
| Train | 95,516 |
| Validation | 27,489 |
| Test | 13,340 |
提供机构:
Gabriel
原始信息汇总
瑞典Wiki_lingua数据集概述
数据集概况
- 语言: 瑞典语
- 许可证: CC-BY-SA-3.0
- 大小: 10K<n<100K
- 来源: https://github.com/morningmoni/CiteSu
- 任务类别: 摘要生成, 文本到文本生成
- 标签: 条件文本生成
数据集详情
- 数据实例ID: gem_id, gem_id_parent
- 文档内容: Document (字符串)
- 摘要内容: Summary (字符串)
数据分割
- 分割方式: 与原始英语版本相同
- 分割类型: 训练集, 验证集, 测试集
- 实例数量:
- 训练集: 95,516
- 验证集: 27,489
- 测试集: 13,340



