five

Shah Abdul Latif Bhittai Poetry Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/7FQ6JJ
下载链接
链接失效反馈
官方服务:
资源简介:
Shah Abdul Latif Bhittai’s Poetry Dataset “Shah Jo Risalo” Shah Abdul Latif Bhittai’s Poetry Dataset “Shah Jo Risalo” Developed by: Abdul Majid Bhurgri Institute of Language Engineering (AMBILE), Hyderabad Under the administrative control of the Culture, Tourism, Antiquities & Archives Department, Government of Sindh. The “Shah Jo Risalo” Dataset is a rich linguistic and literary resource comprising 43,779 Sindhi poetic verses extracted from the 30 traditional Surs of Shah Abdul Latif Bhittai’s magnum opus. Each verse is paired with a Sindhi-language explanation, primarily derived from the authoritative interpretations of Dr. Nabi Bakhsh Baloch. This dataset provides profound insights into the philosophical, spiritual, and cultural themes embedded in the poetry, making it an invaluable asset for Sindhi literature researchers, linguists, educators, and AI/NLP developers. Dataset Features Total Verses: 43,779 poetic lines from 30 classical Surs Language: Clean Sindhi script in Unicode format File Format: CSV file titled “Shah Jo Risalo labeled.csv” CSV Structure melody (سر): Name of the Sur chapter (داستان): Subsection within the Sur chapter_verse_number: Verse number within the dastan poetry_text (بيت): Original Sindhi poetic verse explanation (وضاحت): Sindhi interpretation of the verse keywords: Search-optimized terms compiler_name: Name of the compiler of various versions of Shah Jo Risalo. Compilers: Allama I. I. Qazi Banhoo Khan Shaikh Dr. Nabi Bux 28 Surs Explanation Dr. Ernest Trumpp GM Shahwani Gurbakh Shani Mirza Qaleech Baig Tara Chand Shokeeram Usman Ali Ansari Usman Diplai Adwani Kaliyan Dr. Nabi Bux Baloch (1165–1207 Hijri) Dr. Nabi Bux Baloch (1269 Hijri and 1270 Hijri) Dr. Nabi Bux Baloch (British Museum) Applications Natural Language Processing (NLP) research in Sindhi AI-based Sindhi chatbots and conversational agents Development of educational tools for literature learning Text-to-Speech (TTS) system training Verse classification and sentiment analysis tasks Digital preservation and promotion of Sindhi literary heritage Data Source The dataset is sourced from the AMBILE Bhittaipedia project License This dataset is shared under the Creative Commons Attribution-Noncommercial 4.0 International License, allowing its use for educational and research purposes. Acknowledgments Special thanks to the AMBILE team for their involvement in data compilation and cleaning. Contact For inquiries, collaborations, or contributions: Email: datasets@sindh.ai
创建时间:
2025-08-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作