jamalinu/tarifit-spanish-public-services
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jamalinu/tarifit-spanish-public-services
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- rif
- es
license: cc-by-nc-4.0
task_categories:
- translation
- text-classification
tags:
- tarifit
- tamazight
- spanish
- amazigh
- low-resource
- public-services
- morocco
- rif
pretty_name: Tarifit-Spanish Public Services Corpus
size_categories:
- n<1K
---
# Tarifit-Spanish Public Services Corpus
## Description
Parallel corpus between Tarifit (Riffian Tamazight) and Spanish,
focused on public services vocabulary. Tarifit is a Berber language
spoken by approximately 4 million people in the Rif region of
northern Morocco and by a significant diaspora in Catalonia,
the Netherlands, and Belgium.
This is the first publicly available parallel corpus for the
Tarifit-Spanish language pair.
## Domains
- `saludos` — Greetings and farewells (10)
- `comunicacio_basica` — Basic communication (21)
- `orientacio` — Urban orientation (3)
- `universitat` — University services (123)
- `allotjament` — Housing (12)
- `salut` — Health services (38)
- `biblioteca` — Library services (9)
- `comerc` — Commerce and stationery (24)
- `banc` — Banking (11)
- `esports` — Sports services (22)
- `correus` — Postal services (11)
- `telefon` — Telephone (17)
- `transports` — Public transport (23)
- `restauracio` — Food and restaurants (53)
- `lleure` — Leisure and socializing (17)
## Dataset Structure
| Field | Description |
|---|---|
| `id` | Unique identifier |
| `text_rif` | Source text in Tarifit (Latin script) |
| `tifinagh` | Source text in Tifinagh script (when available) |
| `translation_cat` | Catalan translation |
| `source` | Origin: original, ub_guia |
| `dialect_var` | Dialectal variety: nador |
| `domain` | Thematic domain |
| `label` | Domain label (integer) |
| `type` | Entry type: sentence, phrase, vocabulary |
| `subtopic` | Subcategory within domain |
## Language Notes
Tarifit is written here in Latin script following common diaspora
usage. The corpus reflects natural code-switching between Tarifit,
Arabic loanwords, and French/Spanish borrowings typical of
spoken Tarifit in urban contexts.
## Source
Based on the Universitat de Barcelona conversational guide
(Guia de conversa universitària) and original material created
by the author.
## How to Use
```python
from datasets import load_dataset
dataset = load_dataset("jamalinu/tarifit-spanish-public-services")
```
## Citation
If you use this dataset, please cite:
```bibtex
@dataset{tarifit_spanish_2026,
author = {Jamalinu},
title = {Tarifit-Spanish Public Services Corpus},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/jamalinu/tarifit-spanish-public-services}
}
```
## License
CC-BY-NC 4.0 — Free to use with attribution for non-commercial purposes.
For commercial use, contact the author.
## Author
Native Tarifit speaker and NLP researcher.
Dialect: Nador region (Rif, Morocco).
First published: March 2026.
提供机构:
jamalinu



