five

TAFILAT Dataset

收藏
github2024-09-22 更新2024-10-08 收录
下载链接:
https://github.com/NoorBayan/Tafilat
下载链接
链接失效反馈
官方服务:
资源简介:
TAFILAT数据集是一个包含所有可能的阿拉伯诗歌韵律模式的完整数据集,精心从古典韵律来源中整理而成。设计用于自动化和增强阿拉伯韵律分类和诗歌生成的研究。

The TAFILAT Dataset is a comprehensive dataset encompassing all possible Arabic poetic meter patterns, meticulously curated from classical prosodic sources. It is designed to support research on automated and enhanced Arabic prosody classification and poetry generation.
创建时间:
2024-09-13
原始信息汇总

TAFILAT Dataset: Comprehensive Solution Space for Arabic Poetic Meters

Dataset Summary

TAFILAT is a dataset that includes all possible patterns of Arabic poetic meters. The dataset was built following the rigorous traditional rules of Arabic prosody from six authoritative sources and verified by two experts in the field. The dataset encompasses the full range of 16 primary meters introduced by Al-Khalil ibn Ahmad Al-Farahidi, as well as the variations introduced by zihafat (substitutions) and ‘ilal (deviations).

The dataset includes:

  • 16 poetic meters (bahrs)
  • Taf’ilat patterns with various modifications (zihafat and ‘ilal)
  • Prosodic symbols (0 for consonants and 1 for vowels)
  • Meter-specific patterns for each poetic verse
  • Details of each tafila and its modifications
  • Rhyme scheme indicators

Data Description

The dataset is organized into a table with the following key attributes:

  1. Poetic Meter (Bahr): The metrical form of the poem.
  2. Taf’ilat Count: The number of taf’ilat, which can be 2, 3, 4, 6, or 8.
  3. Modification Status: Indicates if the pattern has zihafat (substitutions) or ‘ilal (modifications).
  4. Prosodic Symbols: Binary encoding (0 for consonants, 1 for vowels).
  5. Taf’ilat: The actual taf’ilat pattern for each poetic meter.
  6. Taf’ila Details: Specific details about the modifications (zihafat or ‘ilal) for each pattern.
  7. Rhyme Symbol: Encoded form of the rhyme scheme.
  8. Rhyme Boundaries: Boundaries of the rhyme pattern.

Methodology

Step 1: Pattern Generation

We applied the Bohor system to generate metrical patterns based on four key prosodic traditions:

  • Al-Khalil ibn Ahmads System: Covers 16 classical Arabic poetic meters, focusing on classical Arabic poetry known as "Qasidah."
  • Modern Additions to Al-Khalils System: Includes new meters used in colloquial Arabic poetry introduced by modern poets.
  • Borrowed Meters: Includes borrowed meters from other literary traditions, such as the Persian-origin "Dobayti."
  • Free Verse Tafilat-Based Poetry: Focuses on modern free verse poetry that adheres to tafilat without strict adherence to a specific meter.

Note: The current dataset covers only the traditional system of Al-Khalil ibn Ahmad.

Step 2: Database Construction

In this step, we built the TAFILAT database, carefully cataloging every valid pattern for each meter. Only the classical meters from Al-Khalils system were used in this phase. The dataset represents a full solution space for metrical analysis, encoding every possible pattern.

Step 3: Verification and Evaluation

Linguistic experts evaluated the dataset’s accuracy, ensuring its reliability for further research. The taf’ilat were cross-verified against traditional sources and modern interpretations of Arabic poetic meter.

Importance and Applications

The TAFILAT dataset offers significant benefits to researchers and developers working in Arabic prosody, machine learning, and natural language processing (NLP):

  • Metrical Analysis: Facilitates the study and analysis of Arabic poetic meters.
  • Automated Poetry Evaluation: Helps in developing systems for automated evaluation of poetic structure and adherence to classical meters.
  • NLP and Machine Learning: The dataset can be integrated into NLP or machine learning models for tasks like automated poetry generation, prosodic analysis, and verse classification.
  • Pedagogical Tools: Useful for teaching Arabic poetry and prosody in both academic and informal settings.

Sample Data

Here is a brief look at a sample from the TAFILAT dataset:

Bahr Taf’ilat Count Prosodic Symbols Tafilat Modifications Rhyme Symbol Boundary
Tawil 8 1010101010 Faulun Mafailun Zihaf: Qabadh 0101 110
Basit 6 110110110 Mustafilun Failun Zihaf: Khabn 1100 001

How to Use the Dataset

The TAFILAT dataset is available in CSV format and can be integrated into various Natural Language Processing (NLP) or machine learning projects related to Arabic prosody and poetry analysis. To access the dataset:

  1. Clone the repository: bash git clone https://github.com/NoorBayan/TAFILAT.git

  2. Access detailed instructions: For comprehensive guidelines on how to use the dataset, including sample code and practical applications, visit the Open in Colab, where you’ll find step-by-step instructions.

Steps to Use Tafilat on Google Colab:

  1. Run the first cell by clicking the "Run" button to load the necessary files and libraries.
  2. Execute the notebook, and dropdown menus will appear with various poetry categories. You can experiment by selecting different options, and the corresponding poetic data will be displayed based on your choices.

Future Work

The TAFILAT dataset is a foundational tool for Arabic meter research, and we envision several future expansions, including:

  • Incorporating Modern Meters: Extending the dataset to include modern Arabic meters that are used in free verse and contemporary poetry, allowing a more comprehensive analysis.
  • Enhancing Prosodic Visualization: Developing visual tools to represent taf’ilat patterns interactively, aiding in the intuitive understanding of complex poetic structures.
  • Integration with Generative AI: Combining the dataset with AI models to create automated Arabic poem generation systems that follow classical metrical rules.
  • Expansion to Other Dialects: Including meters from Arabic dialectal poetry and prosodic patterns in non-classical forms, enriching the dataset for broader applications.

Contributing

We welcome contributions from the community! Whether you are an expert in Arabic prosody, a data scientist, or a developer interested in enhancing this dataset, your input is valuable. To contribute:

  1. Fork the repository.
  2. Create a new branch for your feature or bugfix.
  3. Make your changes and add tests if applicable.
  4. Submit a pull request with a detailed description of your changes.

Please refer to our CONTRIBUTING.md file for more information.

License

The TAFILAT dataset is open-source and is licensed under the MIT License. Feel free to use, modify, and distribute the dataset in your research or applications, as long as proper attribution is given.

搜集汇总
数据集介绍
main_image_url
构建方式
TAFILAT数据集的构建过程严格遵循阿拉伯古典韵律的传统规则,从六个权威来源中精心挑选并验证了所有可能的阿拉伯诗歌韵律模式。首先,通过应用Bohor系统,基于四个关键的韵律传统生成韵律模式。随后,构建了TAFILAT数据库,详细记录了每个韵律的有效模式。最后,由语言学专家对数据集的准确性进行了评估,确保其可靠性和学术价值。
使用方法
TAFILAT数据集以CSV格式提供,适用于各种自然语言处理(NLP)或机器学习项目,特别是涉及阿拉伯韵律和诗歌分析的项目。用户可以通过克隆GitHub仓库获取数据集,并参考Google Colab中的详细使用指南,包括示例代码和实际应用步骤。通过这些资源,用户可以轻松地将数据集集成到自己的研究或应用中,进行深入的韵律分析和诗歌生成。
背景与挑战
背景概述
阿拉伯诗歌在阿拉伯世界的文化结构中占有深远的重要性,其严格的韵律规则确保了其精确与美感。传统的阿拉伯韵律学,即‘Ilm al-‘Arud,采用了一套定义明确的韵律系统(taf'ilat),这对诗人避免韵律错误(taksir)至关重要。然而,现代研究者往往仅限于识别主要的韵律,忽视了其复杂多样的模式。TAFILAT数据集通过提供所有可能的taf’ilat模式,填补了这一空白,这些模式源自超过1200年的韵律传统。该数据集由权威的阿拉伯韵律学六部经典著作构建,并由两位领域专家验证,涵盖了Al-Khalil ibn Ahmad Al-Farahidi引入的16种主要韵律及其变体。TAFILAT数据集不仅为阿拉伯韵律分类和诗歌生成提供了全面的解决方案空间,还为学者和AI研究者深入理解阿拉伯韵律分类、自动化诗歌分析或生成符合韵律的阿拉伯诗歌提供了强大的资源。
当前挑战
TAFILAT数据集在构建过程中面临了多个挑战。首先,生成符合传统规则的韵律模式需要深入理解复杂的阿拉伯韵律学,这要求研究者具备深厚的专业知识。其次,数据库的构建需要精确地记录每一种韵律模式及其变体,确保数据的完整性和准确性。此外,验证和评估阶段需要领域专家的参与,以确保数据集的可靠性。在应用方面,如何将这一数据集有效地整合到自然语言处理(NLP)和机器学习模型中,以实现自动化诗歌生成和韵律分析,也是一个重要的挑战。未来,扩展数据集以包含现代阿拉伯韵律和方言韵律,以及开发交互式可视化工具,将是进一步的研究方向。
常用场景
经典使用场景
在阿拉伯诗歌研究领域,TAFILAT数据集的经典使用场景主要集中在自动化诗歌分析和生成。该数据集提供了所有可能的阿拉伯诗歌韵律模式,使得研究人员能够开发出能够自动识别和分类阿拉伯诗歌韵律的系统。此外,TAFILAT数据集还可用于生成符合古典韵律规则的阿拉伯诗歌,从而推动了阿拉伯诗歌创作的自动化进程。
解决学术问题
TAFILAT数据集解决了阿拉伯诗歌研究中长期存在的韵律模式识别问题。传统上,研究人员往往只关注主要的韵律模式,而忽略了其复杂的变体。该数据集通过提供全面的韵律模式解决方案,填补了这一研究空白,使得学者们能够更深入地理解和分析阿拉伯诗歌的韵律结构,从而推动了该领域的学术研究进展。
实际应用
在实际应用中,TAFILAT数据集被广泛用于阿拉伯诗歌的自动化评估和教学工具的开发。例如,教育机构可以利用该数据集开发出能够自动评估学生诗歌创作是否符合古典韵律规则的工具,从而提升教学效果。此外,该数据集还可用于开发智能诗歌生成系统,为阿拉伯诗歌爱好者提供创作灵感。
数据集最近研究
最新研究方向
在阿拉伯诗歌领域,TAFILAT数据集的最新研究方向主要集中在自动化诗歌生成和韵律分析的深度学习模型开发。该数据集通过提供全面的阿拉伯诗歌韵律模式,为研究人员提供了丰富的资源,以探索和实现基于经典韵律规则的自动诗歌生成系统。此外,研究者们正致力于将该数据集与生成对抗网络(GANs)等先进的人工智能技术结合,以创造出既符合传统韵律要求又具有创新性的阿拉伯诗歌。这些研究不仅推动了阿拉伯诗歌的数字化和智能化进程,也为跨文化诗歌研究提供了新的视角和方法。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作