Oromo Auto-Grammar Dataset

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://data.mendeley.com/datasets/n5wg3mbp9r

下载链接

链接失效反馈

官方服务：

资源简介：

This contribution is a novel dataset called Oromo-grammar-dataset. The dataset is prepared using a custom Python algorithm. To prepare the dataset, we used a sample of 200KB (about 100 Pages of raw text) collected from online sources. Our algorithm performed well to automatically generate a grammar-aware dataset for the Oromo language. The method can easily be reproducible to any other language with a systematic analysis and slight modifications to its affix structures to generate similar datasets. The output of the software is a grammar-rich dataset, which is applicable to modern NLP applications like machine translation, sentence completion, and grammar and spell checker. The dataset also helps linguists and academia in teaching language grammar structures.

创建时间：

2023-05-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集