DarNERcorp: a Named Entity Recognition Corpus in the Moroccan Dialect

Name: DarNERcorp: a Named Entity Recognition Corpus in the Moroccan Dialect
Creator: Hanane Nour Mousa
License: 暂无描述

Mendeley Data2026-04-09 收录

下载链接：

https://data.mendeley.com/datasets/286sss4k9v

下载链接

链接失效反馈

官方服务：

资源简介：

DarNERcorp is a manually annotated corpus for Named Entity Recognition (NER) in the Moroccan Dialect or Darija. The corpus contains more than 65K tokens, 13.8% of which are named entities. Named entities in the dataset are annotated with one of the following tags, using the BIO tagging scheme: person (PER), location (LOC), organization (ORG), miscellaneous (MISC). The distribution of named entities in the dataset is as follows: PER (15.3%), LOC (38.1%), ORG (15.5%), MISC (31.1%). The corpus is presented in the Data folder and it is split into two sets: DarNERcorp_train and DarNERcorp_test. The first set represents 80% of the data and the second represents 20%. In addition to the data, the Python scripts used in the collection and data formatting are provided in the Code folder.

提供机构：

Hanane Nour Mousa

5,000+

优质数据集

54 个

任务类型

进入经典数据集