SumArabic
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://data.mendeley.com/datasets/7kr75c9h24
下载链接
链接失效反馈官方服务:
资源简介:
SumArabic is an Arabic abstractive text summarization dataset.
The uploaded files are:
- sumarabic-1.0-index.jsonl.xz: a file that contains the Common Crawl records
- A script to download the data
The data are from the following two Arabic news websites:
- emaratalyoum.com
- almamlakatv.com
The data are splitted into training, testing, validation, and out-of-domain sets. The number of examples in each split is as follows:
Training: 75,817
Validation: 4,121
Testing: 4,174
Out-of-domain: 652
Total: 84,764
创建时间:
2022-06-29



