ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/gbkszkt8z3
下载链接
链接失效反馈官方服务:
资源简介:
We developed ANCHOLIK-NER, a Bangla Regional Named Entity Recognition dataset focusing on the Sylhet, Chittagong, Barishal, Mymensingh, and Noakhali dialects. It comprises 17,405 sentences, evenly distributed across the five regions, with entities categorized into 10 types. The raw sentences were collected from two publicly available datasets and through web scraping from various online newspapers, articles.
本研究构建了ANCHOLIK-NER数据集,这是一款面向孟加拉语的区域命名实体识别(Named Entity Recognition)数据集,聚焦锡尔赫特、吉大港、巴里萨尔、迈门辛与诺阿卡利方言。该数据集共包含17405条语句,在上述五个区域中均匀分布,且实体被划分为10个类别。原始语料语句采集自两个公开可用的数据集,并通过网络爬取各类在线报纸与文章获取。
提供机构:
Bangladesh University of Engineering and Technology; Southeast University; Ahsanullah University of Science and Technology



