five

jamalinu/tarifit-catalan-public-services

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jamalinu/tarifit-catalan-public-services
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - rif - es - ca license: cc-by-4.0 task_categories: - translation - text-generation - morphology-prediction tags: - tarifit - amazigh - berber - low-resource - morphology - verb-conjugation pretty_name: Tarifit Morphological Verb Dataset size_categories: - n<1K --- # 🏔️ Tarifit Morphological Verb Dataset (TMVD) ## 📌 Overview The **Tarifit Morphological Verb Dataset (TMVD)** is a high-quality, human-curated resource designed to bridge the gap in Digital Language Support for **Tarifit** (Riffian Berber). Despite being spoken by millions in the Rif region and the diaspora, Tarifit remains a severely underrepresented Afro-Asiatic language in the NLP landscape. Unlike many large-scale datasets for low-resource languages that rely on noisy web-scraping or inaccurate machine translation, this dataset has been **manually authored and validated by a native speaker** with a professional background in linguistics and multilingual proficiency (Berber, Arabic, Catalan, Spanish, French, English). ## 🧪 Research Value This dataset provides a granular look at Tarifit verbal morphology, characterized by complex initial and internal changes. It is particularly valuable for: * **Instruction Tuning:** Training LLMs to understand and generate grammatically correct Tarifit. * **Morphological Analysis:** Evaluating the ability of models like Gemini, Llama, or Claude to handle non-concatenative morphology. * **Low-Resource Benchmarking:** Serving as a "Gold Standard" for zero-shot or few-shot translation tasks involving Catalan and Spanish. ## 📊 Dataset Structure The dataset contains structured conjugations for core verbs (including *iri, aqqa, ttuga, gar, day*), covering various tenses and moods: * **Preterite & Imperfective:** Capturing the aspectual nuances of the Berber verb system. * **Future & Imperative:** Including prohibitive and intensive forms. * **Deictic & Existential Verbs:** Essential for building dialogue systems and spatial reasoning. ### Column Metadata | Column | Description | Example | | :--- | :--- | :--- | | `verb_id` | Citation form (Lemma) | `iri` | | `translation` | Semantic gloss in Spanish/Catalan | `ser/existir` | | `pronoun` | Subject pronoun (Independent form) | `Nec` | | `person` | Grammatical person (1, 2, 3) | `1` | | `number` | Grammatical number (sg, pl) | `sg` | | `gender` | Grammatical gender (masc, fem, neuter) | `neuter` | | `tense` | Morphosyntactic category | `preterito` | | `form` | The target conjugated form (Latin-based) | `edjig` | ## 🛠️ Applications This project is a foundational component of an ongoing effort to create a **Tarifit-Catalan Public Service Dialogue System**, aimed at improving accessibility and linguistic inclusion for the Berber-speaking community. ## 📜 Citation & License If you use this dataset in your research or for model training, please cite it as follows: ```bibtex @dataset{tarifit_verb_conjugations_2025, author = {Jamal I. (jamalinu)}, title = {Tarifit Morphological Verb Dataset}, year = {2025}, publisher = {Hugging Face}, url = {[https://huggingface.co/datasets/jamalinu/tarifit-verb-conjugations](https://huggingface.co/datasets/jamalinu/tarifit-verb-conjugations)} }
提供机构:
jamalinu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作