Vatosoa/pos-tagging-malagasy-sokajy
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Vatosoa/pos-tagging-malagasy-sokajy
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- mg
license: cc-by-nc-sa-4.0
size_categories:
- 1K<n<10K
task_categories:
- token-classification
tags:
- corpus
- malagasy
- pos-tagging
- conllu
---
# FITSIPIKA Malagasy Dataset (SOKAJY)
## Dataset Description
**SOKAJY** is a specialized morphosyntactic corpus for the Malagasy language (28M speakers). It focuses on Part-of-Speech (POS) tagging and linguistic structure analysis, specifically designed to handle the unique syntactic challenges of the Malagasy language.
- **Curators:** Vatosoa Razafindrazaka (Madagascar)
- **Format:** CoNLL-U
- **Language:** Official Malagasy (Merina and regional variants)
- **Status:** Actively maintained for PhD research on Malagasy ASR.
## Citation
If you use this dataset, please cite the following publication:
V. Razafindrazaka, "Part-of-Speech Tagging to Structure Malagasy Sentences", Springer Nature, Progress in IS (Expected April 2026).
提供机构:
Vatosoa



