five

jamalinu/tarifit-spanish-public-services

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jamalinu/tarifit-spanish-public-services
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - rif - es license: cc-by-nc-4.0 task_categories: - translation - text-classification tags: - tarifit - tamazight - spanish - amazigh - low-resource - public-services - morocco - rif pretty_name: Tarifit-Spanish Public Services Corpus size_categories: - n<1K --- # Tarifit-Spanish Public Services Corpus ## Description Parallel corpus between Tarifit (Riffian Tamazight) and Spanish, focused on public services vocabulary. Tarifit is a Berber language spoken by approximately 4 million people in the Rif region of northern Morocco and by a significant diaspora in Catalonia, the Netherlands, and Belgium. This is the first publicly available parallel corpus for the Tarifit-Spanish language pair. ## Domains - `saludos` — Greetings and farewells (10) - `comunicacio_basica` — Basic communication (21) - `orientacio` — Urban orientation (3) - `universitat` — University services (123) - `allotjament` — Housing (12) - `salut` — Health services (38) - `biblioteca` — Library services (9) - `comerc` — Commerce and stationery (24) - `banc` — Banking (11) - `esports` — Sports services (22) - `correus` — Postal services (11) - `telefon` — Telephone (17) - `transports` — Public transport (23) - `restauracio` — Food and restaurants (53) - `lleure` — Leisure and socializing (17) ## Dataset Structure | Field | Description | |---|---| | `id` | Unique identifier | | `text_rif` | Source text in Tarifit (Latin script) | | `tifinagh` | Source text in Tifinagh script (when available) | | `translation_cat` | Catalan translation | | `source` | Origin: original, ub_guia | | `dialect_var` | Dialectal variety: nador | | `domain` | Thematic domain | | `label` | Domain label (integer) | | `type` | Entry type: sentence, phrase, vocabulary | | `subtopic` | Subcategory within domain | ## Language Notes Tarifit is written here in Latin script following common diaspora usage. The corpus reflects natural code-switching between Tarifit, Arabic loanwords, and French/Spanish borrowings typical of spoken Tarifit in urban contexts. ## Source Based on the Universitat de Barcelona conversational guide (Guia de conversa universitària) and original material created by the author. ## How to Use ```python from datasets import load_dataset dataset = load_dataset("jamalinu/tarifit-spanish-public-services") ``` ## Citation If you use this dataset, please cite: ```bibtex @dataset{tarifit_spanish_2026, author = {Jamalinu}, title = {Tarifit-Spanish Public Services Corpus}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/jamalinu/tarifit-spanish-public-services} } ``` ## License CC-BY-NC 4.0 — Free to use with attribution for non-commercial purposes. For commercial use, contact the author. ## Author Native Tarifit speaker and NLP researcher. Dialect: Nador region (Rif, Morocco). First published: March 2026.
提供机构:
jamalinu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作