hezarai/xlsum-fa
收藏Hugging Face2024-05-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hezarai/xlsum-fa
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- fa
task_categories:
- summarization
pretty_name: XLSum Persian
dataset_info:
features:
- name: text
dtype: string
- name: summary
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 326304750
num_examples: 53128
- name: test
num_bytes: 29490737
num_examples: 5906
download_size: 168140025
dataset_size: 355795487
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
The Persian portion of the [XLSum](https://huggingface.co/datasets/csebuetnlp/xlsum) dataset.
### Citation
```bibtex
@inproceedings{hasan-etal-2021-xl,
title = "{XL}-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages",
author = "Hasan, Tahmid and
Bhattacharjee, Abhik and
Islam, Md. Saiful and
Mubasshir, Kazi and
Li, Yuan-Fang and
Kang, Yong-Bin and
Rahman, M. Sohel and
Shahriyar, Rifat",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.413",
pages = "4693--4703",
}
```
---
语言:
- 波斯语(fa)
任务类别:
- 文本摘要
友好名称:XLSum波斯语版(XLSum)
数据集信息:
特征项:
- 名称:文本(text),数据类型:字符串
- 名称:摘要(summary),数据类型:字符串
- 名称:标题(title),数据类型:字符串
划分集:
- 名称:训练集(train):数据字节数326304750,样本量53128
- 名称:测试集(test):数据字节数29490737,样本量5906
下载大小:168140025字节
数据集总大小:355795487字节
配置项:
- 配置名称:默认(default):
数据文件:
- 划分集:训练集(train),路径:data/train-*
- 划分集:测试集(test),路径:data/test-*
---
本数据集为[XLSum](https://huggingface.co/datasets/csebuetnlp/xlsum)数据集的波斯语子集。
### 引用
bibtex
@inproceedings{hasan-etal-2021-xl,
title = "XL-Sum:面向44种语言的大规模多语言抽象式摘要数据集",
author = "Hasan, Tahmid and
Bhattacharjee, Abhik and
Islam, Md. Saiful and
Mubasshir, Kazi and
Li, Yuan-Fang and
Kang, Yong-Bin and
Rahman, M. Sohel and
Shahriyar, Rifat",
booktitle = "《国际计算语言学协会2021年ACL-IJCNLP联合会议研究论文集》",
month = "8月",
year = "2021",
address = "线上举办",
publisher = "国际计算语言学协会(Association for Computational Linguistics)",
url = "https://aclanthology.org/2021.findings-acl.413",
pages = "4693--4703",
}
提供机构:
hezarai
原始信息汇总
数据集概述
基本信息
- 语言: 波斯语 (fa)
- 任务类别: 摘要生成 (summarization)
- 数据集名称: XLSum Persian
数据集特征
- 文本 (text): 数据类型为字符串
- 摘要 (summary): 数据类型为字符串
- 标题 (title): 数据类型为字符串
数据集划分
- 训练集 (train):
- 示例数量: 53128
- 字节数: 326304750
- 测试集 (test):
- 示例数量: 5906
- 字节数: 29490737
数据集大小
- 下载大小: 168140025 字节
- 总大小: 355795487 字节
数据文件配置
- 默认配置 (default):
- 训练集路径: data/train-*
- 测试集路径: data/test-*



