five

jamalinu/tarifit-verb-conjugations

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jamalinu/tarifit-verb-conjugations
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - rif - es - ca license: cc-by-4.0 task_categories: - translation - text-generation - morphology-prediction tags: - tarifit - amazigh - berber - low-resource - morphology - verb-conjugation pretty_name: Tarifit Morphological Verb Dataset size_categories: - n<1K --- # 🏔️ Tarifit Morphological Verb Dataset (TMVD) ## 📌 Overview The **Tarifit Morphological Verb Dataset (TMVD)** is a high-quality, human-curated resource designed to bridge the gap in Digital Language Support for **Tarifit** (Riffian Berber), a severely underrepresented Afro-Asiatic language spoken by millions in the Rif region and the diaspora. Unlike many large-scale datasets for low-resource languages that rely on noisy web-scraping or inaccurate machine translation, this dataset has been **manually authored and validated by a native speaker** with a background in linguistics and philology. ## 🧪 Research Value This dataset provides a granular look at Tarifit verbal morphology, which is characterized by complex initial and internal changes. It is particularly valuable for: * **Instruction Tuning:** Training LLMs to understand and generate grammatically correct Tarifit. * **Morphological Analysis:** Evaluating the ability of models like Gemini, Llama, or Claude to handle non-concatenative morphology. * **Low-Resource Benchmarking:** Serving as a "Gold Standard" for zero-shot or few-shot translation tasks involving Catalan and Spanish. ## 📊 Dataset Structure The dataset contains structured conjugations for core verbs (including *iri, aqqa, ttuga, gar, day*), covering various tenses and moods: * **Preterite & Imperfective:** Capturing the aspectual nuances of the Berber verb system. * **Future & Imperative:** Including prohibitive and intensive forms. * **Deictic & Existential Verbs:** Essential for building dialogue systems. ### Column Metadata | Column | Description | | :--- | :--- | | `verb_id` | Citation form (Lemma). | | `translation` | Semantic gloss in Spanish/Catalan. | | `pronoun` | Subject pronoun (Independent form). | | `person/number/gender` | Grammatical features for agreement verification. | | `tense/subtype` | Morphosyntactic category. | | `form` | The target conjugated form in Latin-based transcription. | ## 🛠️ Applications This project is part of an ongoing effort to create a **Tarifit-Catalan Public Service Dialogue System**, aimed at improving accessibility and linguistic inclusion. ## 📜 Citation & License If you use this dataset in your research, please cite it as follows: ```bibtex @dataset{tarifit_verb_conjugations_2025, author = {Jamal I. (jamalinu)}, title = {Tarifit Morphological Verb Dataset}, year = {2025}, publisher = {Hugging Face}, url = {[https://huggingface.co/datasets/jamalinu/tarifit-verb-conjugations](https://huggingface.co/datasets/jamalinu/tarifit-verb-conjugations)} }
提供机构:
jamalinu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作