Morphologically-Analyzed and Syntactically-Annotated Quran Dataset (MASAQ)
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/9yvrzxktmr/1
下载链接
链接失效反馈官方服务:
资源简介:
The Morphologically-Analyzed and Syntactically-Annotated Quran (MASAQ) dataset is a comprehensive resource designed to address the scarcity of annotated Quranic Arabic corpora and facilitate the development of advanced Natural Language Processing (NLP) models. MASAQ provides a detailed syntactic and morphological annotation of the entire Quranic text, utilizing a rigorously verified text from Tanzil.net. The dataset includes more than 131K morphological entries and 123K instances of syntactic functions, covering a wide range of grammatical roles and relationships. The annotation process involved a team of expert Arabic linguists who employed traditional i'rab methodologies to ensure high accuracy and consistency. The dataset is structured in multiple formats (txt, CSV, xlsx, XML, JSON) to cater to various research needs. The potential applications of MASAQ are vast, ranging from pedagogical uses in teaching Arabic grammar to developing sophisticated NLP tools. By providing a high-quality, syntactically annotated dataset, MASAQ aims to advance the field of Arabic NLP, enabling more accurate and more efficient language processing tools. The dataset is made available under the Creative Commons Attribution 3.0 License, ensuring compliance with ethical guidelines and respecting the integrity of the Quranic text.
The Morphologically-Annotated and Syntactically-Annotated Quran (MASAQ) dataset presents significant potential applications across domains. Pedagogically, it can simplify the teaching of Arabic grammar by focusing on fundamental concepts. In NLP, MASAQ can enhance tools like part-of-speech taggers and parsers, which are essential for automated language understanding. Linguistically, the dataset provides valuable syntactic analysis for linguistic research. Additionally, dependency parsers derived from MASAQ can efficiently analyze web content, resolve several types of sentence ambiguities, and contribute to semantic representations. The dataset also supports efforts like Universal Dependencies, facilitating cross-linguistic research and multilingual NLP tool development. Furthermore, integrating dependency parsing with machine learning classifiers can improve parsing accuracy and efficiency, particularly useful for languages with free word order, like Written Arabic. Overall, MASAQ offers a comprehensive resource for advancing both academic and practical applications in Arabic NLP.



