five

AgaCKNER: First Kurdish Sorani Named Entity Recognition Dataset

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/b3wvj6jgx8
下载链接
链接失效反馈
官方服务:
资源简介:
AgaCKNER is the first publicly accessible Named Entity Recognition (NER) dataset in the Kurdish Sorani language, developed to advance research in low-resource language processing. Derived from the Rudaw Media Network, AgaCKNER encompasses a broad array of topics across five distinct domains: Kurdistan news, Middle East news, world news, economic news, and sports news that are meticulously curated from over 160 articles. The dataset includes 2,534 sentences and 64,563 tokens, pre-processed and formatted in CoNLL for NER tasks. Entities are labelled in BIO format under five categories: PERSON, LOCATION, ORGANIZATION, DATE, and Miscellaneous. AgaCKNER is an essential resource for Kurdish Sorani natural language processing, greatly enhancing research in low-resource languages. Its structure makes it easily adaptable for generating training, validation, and test splits.
提供机构:
Sulaimani Polytechnic University; Swansea University - Bay Campus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作