AgaCKNER: First Kurdish Sorani Named Entity Recognition Dataset
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/b3wvj6jgx8
下载链接
链接失效反馈官方服务:
资源简介:
AgaCKNER is the first publicly accessible Named Entity Recognition (NER) dataset in the Kurdish Sorani language, developed to advance research in low-resource language processing. Derived from the Rudaw Media Network, AgaCKNER encompasses a broad array of topics across five distinct domains: Kurdistan news, Middle East news, world news, economic news, and sports news that are meticulously curated from over 160 articles. The dataset includes 2,534 sentences and 64,563 tokens, pre-processed and formatted in CoNLL for NER tasks. Entities are labelled in BIO format under five categories: PERSON, LOCATION, ORGANIZATION, DATE, and Miscellaneous. AgaCKNER is an essential resource for Kurdish Sorani natural language processing, greatly enhancing research in low-resource languages. Its structure makes it easily adaptable for generating training, validation, and test splits.
提供机构:
Sulaimani Polytechnic University; Swansea University - Bay Campus



