Shah Abdul Latif Bhittai Poetry Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/7FQ6JJ
下载链接
链接失效反馈官方服务:
资源简介:
Shah Abdul Latif Bhittai’s Poetry Dataset “Shah Jo Risalo” Shah Abdul Latif Bhittai’s Poetry Dataset “Shah Jo Risalo” Developed by: Abdul Majid Bhurgri Institute of Language Engineering (AMBILE), Hyderabad Under the administrative control of the Culture, Tourism, Antiquities & Archives Department, Government of Sindh. The “Shah Jo Risalo” Dataset is a rich linguistic and literary resource comprising 43,779 Sindhi poetic verses extracted from the 30 traditional Surs of Shah Abdul Latif Bhittai’s magnum opus. Each verse is paired with a Sindhi-language explanation, primarily derived from the authoritative interpretations of Dr. Nabi Bakhsh Baloch. This dataset provides profound insights into the philosophical, spiritual, and cultural themes embedded in the poetry, making it an invaluable asset for Sindhi literature researchers, linguists, educators, and AI/NLP developers. Dataset Features Total Verses: 43,779 poetic lines from 30 classical Surs Language: Clean Sindhi script in Unicode format File Format: CSV file titled “Shah Jo Risalo labeled.csv” CSV Structure melody (سر): Name of the Sur chapter (داستان): Subsection within the Sur chapter_verse_number: Verse number within the dastan poetry_text (بيت): Original Sindhi poetic verse explanation (وضاحت): Sindhi interpretation of the verse keywords: Search-optimized terms compiler_name: Name of the compiler of various versions of Shah Jo Risalo. Compilers: Allama I. I. Qazi Banhoo Khan Shaikh Dr. Nabi Bux 28 Surs Explanation Dr. Ernest Trumpp GM Shahwani Gurbakh Shani Mirza Qaleech Baig Tara Chand Shokeeram Usman Ali Ansari Usman Diplai Adwani Kaliyan Dr. Nabi Bux Baloch (1165–1207 Hijri) Dr. Nabi Bux Baloch (1269 Hijri and 1270 Hijri) Dr. Nabi Bux Baloch (British Museum) Applications Natural Language Processing (NLP) research in Sindhi AI-based Sindhi chatbots and conversational agents Development of educational tools for literature learning Text-to-Speech (TTS) system training Verse classification and sentiment analysis tasks Digital preservation and promotion of Sindhi literary heritage Data Source The dataset is sourced from the AMBILE Bhittaipedia project License This dataset is shared under the Creative Commons Attribution-Noncommercial 4.0 International License, allowing its use for educational and research purposes. Acknowledgments Special thanks to the AMBILE team for their involvement in data compilation and cleaning. Contact For inquiries, collaborations, or contributions: Email: datasets@sindh.ai
创建时间:
2025-08-29



