jamalinu/tarifit-verb-conjugations
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jamalinu/tarifit-verb-conjugations
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- rif
- es
- ca
license: cc-by-4.0
task_categories:
- translation
- text-generation
- morphology-prediction
tags:
- tarifit
- amazigh
- berber
- low-resource
- morphology
- verb-conjugation
pretty_name: Tarifit Morphological Verb Dataset
size_categories:
- n<1K
---
# 🏔️ Tarifit Morphological Verb Dataset (TMVD)
## 📌 Overview
The **Tarifit Morphological Verb Dataset (TMVD)** is a high-quality, human-curated resource designed to bridge the gap in Digital Language Support for **Tarifit** (Riffian Berber), a severely underrepresented Afro-Asiatic language spoken by millions in the Rif region and the diaspora.
Unlike many large-scale datasets for low-resource languages that rely on noisy web-scraping or inaccurate machine translation, this dataset has been **manually authored and validated by a native speaker** with a background in linguistics and philology.
## 🧪 Research Value
This dataset provides a granular look at Tarifit verbal morphology, which is characterized by complex initial and internal changes. It is particularly valuable for:
* **Instruction Tuning:** Training LLMs to understand and generate grammatically correct Tarifit.
* **Morphological Analysis:** Evaluating the ability of models like Gemini, Llama, or Claude to handle non-concatenative morphology.
* **Low-Resource Benchmarking:** Serving as a "Gold Standard" for zero-shot or few-shot translation tasks involving Catalan and Spanish.
## 📊 Dataset Structure
The dataset contains structured conjugations for core verbs (including *iri, aqqa, ttuga, gar, day*), covering various tenses and moods:
* **Preterite & Imperfective:** Capturing the aspectual nuances of the Berber verb system.
* **Future & Imperative:** Including prohibitive and intensive forms.
* **Deictic & Existential Verbs:** Essential for building dialogue systems.
### Column Metadata
| Column | Description |
| :--- | :--- |
| `verb_id` | Citation form (Lemma). |
| `translation` | Semantic gloss in Spanish/Catalan. |
| `pronoun` | Subject pronoun (Independent form). |
| `person/number/gender` | Grammatical features for agreement verification. |
| `tense/subtype` | Morphosyntactic category. |
| `form` | The target conjugated form in Latin-based transcription. |
## 🛠️ Applications
This project is part of an ongoing effort to create a **Tarifit-Catalan Public Service Dialogue System**, aimed at improving accessibility and linguistic inclusion.
## 📜 Citation & License
If you use this dataset in your research, please cite it as follows:
```bibtex
@dataset{tarifit_verb_conjugations_2025,
author = {Jamal I. (jamalinu)},
title = {Tarifit Morphological Verb Dataset},
year = {2025},
publisher = {Hugging Face},
url = {[https://huggingface.co/datasets/jamalinu/tarifit-verb-conjugations](https://huggingface.co/datasets/jamalinu/tarifit-verb-conjugations)}
}
提供机构:
jamalinu



