five

DaNERcorp: a Named Entity Recognition Corpus in the Moroccan Dialect

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/286sss4k9v/1
下载链接
链接失效反馈
官方服务:
资源简介:
DaNERcorp is a manually annotated corpus for Named Entity Recognition (NER) in the Moroccan Dialect or Darija. The corpus contains more than 65K tokens, 13.8% of which are named entities. Named entities in the dataset are annotated with one of the following tags, using the IOB2 tagging scheme: person (PER), location (LOC), organization (ORG), miscellaneous (MISC). The distribution of named entities in the dataset is as follows: PER (15.3%), LOC (38.1%), ORG (15.5%), MISC (31.1%).

DaNERcorp是一款面向摩洛哥方言(达里贾语)命名实体识别(Named Entity Recognition,NER)任务的人工标注语料库。该语料库包含超过65,000个Token,其中13.8%为命名实体。数据集中的命名实体采用IOB2标注体系(IOB2 tagging scheme)进行标注,涵盖四类标签:人物(PER)、地点(LOC)、组织机构(ORG)与杂项(MISC)。该数据集中各类命名实体的分布情况如下:人物(PER)占比15.3%,地点(LOC)占比38.1%,组织机构(ORG)占比15.5%,杂项(MISC)占比31.1%。
提供机构:
Hanane Nour Mousa
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作