TAFILAT Dataset
收藏TAFILAT Dataset: Comprehensive Solution Space for Arabic Poetic Meters
Dataset Summary
TAFILAT is a dataset that includes all possible patterns of Arabic poetic meters. The dataset was built following the rigorous traditional rules of Arabic prosody from six authoritative sources and verified by two experts in the field. The dataset encompasses the full range of 16 primary meters introduced by Al-Khalil ibn Ahmad Al-Farahidi, as well as the variations introduced by zihafat (substitutions) and ‘ilal (deviations).
The dataset includes:
- 16 poetic meters (bahrs)
- Taf’ilat patterns with various modifications (zihafat and ‘ilal)
- Prosodic symbols (0 for consonants and 1 for vowels)
- Meter-specific patterns for each poetic verse
- Details of each tafila and its modifications
- Rhyme scheme indicators
Data Description
The dataset is organized into a table with the following key attributes:
- Poetic Meter (Bahr): The metrical form of the poem.
- Taf’ilat Count: The number of taf’ilat, which can be 2, 3, 4, 6, or 8.
- Modification Status: Indicates if the pattern has zihafat (substitutions) or ‘ilal (modifications).
- Prosodic Symbols: Binary encoding (0 for consonants, 1 for vowels).
- Taf’ilat: The actual taf’ilat pattern for each poetic meter.
- Taf’ila Details: Specific details about the modifications (zihafat or ‘ilal) for each pattern.
- Rhyme Symbol: Encoded form of the rhyme scheme.
- Rhyme Boundaries: Boundaries of the rhyme pattern.
Methodology
Step 1: Pattern Generation
We applied the Bohor system to generate metrical patterns based on four key prosodic traditions:
- Al-Khalil ibn Ahmads System: Covers 16 classical Arabic poetic meters, focusing on classical Arabic poetry known as "Qasidah."
- Modern Additions to Al-Khalils System: Includes new meters used in colloquial Arabic poetry introduced by modern poets.
- Borrowed Meters: Includes borrowed meters from other literary traditions, such as the Persian-origin "Dobayti."
- Free Verse Tafilat-Based Poetry: Focuses on modern free verse poetry that adheres to tafilat without strict adherence to a specific meter.
Note: The current dataset covers only the traditional system of Al-Khalil ibn Ahmad.
Step 2: Database Construction
In this step, we built the TAFILAT database, carefully cataloging every valid pattern for each meter. Only the classical meters from Al-Khalils system were used in this phase. The dataset represents a full solution space for metrical analysis, encoding every possible pattern.
Step 3: Verification and Evaluation
Linguistic experts evaluated the dataset’s accuracy, ensuring its reliability for further research. The taf’ilat were cross-verified against traditional sources and modern interpretations of Arabic poetic meter.
Importance and Applications
The TAFILAT dataset offers significant benefits to researchers and developers working in Arabic prosody, machine learning, and natural language processing (NLP):
- Metrical Analysis: Facilitates the study and analysis of Arabic poetic meters.
- Automated Poetry Evaluation: Helps in developing systems for automated evaluation of poetic structure and adherence to classical meters.
- NLP and Machine Learning: The dataset can be integrated into NLP or machine learning models for tasks like automated poetry generation, prosodic analysis, and verse classification.
- Pedagogical Tools: Useful for teaching Arabic poetry and prosody in both academic and informal settings.
Sample Data
Here is a brief look at a sample from the TAFILAT dataset:
| Bahr | Taf’ilat Count | Prosodic Symbols | Tafilat | Modifications | Rhyme Symbol | Boundary |
|---|---|---|---|---|---|---|
| Tawil | 8 | 1010101010 | Faulun Mafailun | Zihaf: Qabadh | 0101 | 110 |
| Basit | 6 | 110110110 | Mustafilun Failun | Zihaf: Khabn | 1100 | 001 |
How to Use the Dataset
The TAFILAT dataset is available in CSV format and can be integrated into various Natural Language Processing (NLP) or machine learning projects related to Arabic prosody and poetry analysis. To access the dataset:
-
Clone the repository: bash git clone https://github.com/NoorBayan/TAFILAT.git
-
Access detailed instructions: For comprehensive guidelines on how to use the dataset, including sample code and practical applications, visit the
, where you’ll find step-by-step instructions.
Steps to Use Tafilat on Google Colab:
- Run the first cell by clicking the "Run" button to load the necessary files and libraries.
- Execute the notebook, and dropdown menus will appear with various poetry categories. You can experiment by selecting different options, and the corresponding poetic data will be displayed based on your choices.
Future Work
The TAFILAT dataset is a foundational tool for Arabic meter research, and we envision several future expansions, including:
- Incorporating Modern Meters: Extending the dataset to include modern Arabic meters that are used in free verse and contemporary poetry, allowing a more comprehensive analysis.
- Enhancing Prosodic Visualization: Developing visual tools to represent taf’ilat patterns interactively, aiding in the intuitive understanding of complex poetic structures.
- Integration with Generative AI: Combining the dataset with AI models to create automated Arabic poem generation systems that follow classical metrical rules.
- Expansion to Other Dialects: Including meters from Arabic dialectal poetry and prosodic patterns in non-classical forms, enriching the dataset for broader applications.
Contributing
We welcome contributions from the community! Whether you are an expert in Arabic prosody, a data scientist, or a developer interested in enhancing this dataset, your input is valuable. To contribute:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Make your changes and add tests if applicable.
- Submit a pull request with a detailed description of your changes.
Please refer to our CONTRIBUTING.md file for more information.
License
The TAFILAT dataset is open-source and is licensed under the MIT License. Feel free to use, modify, and distribute the dataset in your research or applications, as long as proper attribution is given.




