arnizamani/Sindhi-texts-big-dataset
收藏Hugging Face2026-01-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/arnizamani/Sindhi-texts-big-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- sd
---
# Sindhi Texts (big dataset)
This dataset contains data scrapped from different sources, and curated manually.
The texts are organized in folders and files, with each file containing one entry (book, magazine, encyclopedia entry, etc).
### Version 1.0, January 2026
The current version contains texts from the following sources:
* sindhiadabiboard.org: digitized books and magazines (910 items)
* sindhiana.org: The largest Sindhi encyclopedia, several volumes published, but few are still to be published. 20,229 items in this corpus.
* quran.sindhsalamat.com: 14 complete translations of the Quran
* books.sindhsalamat.com: 2747 books digitized by Sindhsalamat forum
The total size is 1.28 GB.
Future improvements:
* Newspaper articles (very few, as most Sindhi newspapers don't publish digital editions)
* Social media (tweets, etc)
* Updates to the existing sources (if new content is available)
提供机构:
arnizamani



