IIC/CoNLL-NERC-es

Hugging Face2026-02-06 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/IIC/CoNLL-NERC-es

下载链接

链接失效反馈

官方服务：

资源简介：

CoNLL-NERC-es是CoNLL-2002共享任务的西班牙语数据集。该数据集标注了四种类型的命名实体——人物、地点、组织和其他杂项实体，并以标准的BIO格式呈现。语料库包含8,324个训练句子（19,400个命名实体）、1,916个开发句子（4,568个命名实体）和1,518个测试句子（3,644个命名实体）。数据来源于西班牙EFE新闻社2000年5月的新闻电讯稿。标注由加泰罗尼亚理工大学TALP研究中心和巴塞罗那大学语言与计算中心完成，由欧盟委员会NAMIC项目资助。该数据集用于西班牙语命名实体识别和分类任务。

CoNLL-NERC is the Spanish dataset of the CoNLL-2002 Shared Task. The dataset is annotated with four types of named entities --persons, locations, organizations, and other miscellaneous entities-- formatted in the standard Beginning-Inside-Outside (BIO) format. The corpus consists of 8,324 train sentences with 19,400 named entities, 1,916 development sentences with 4,568 named entities, and 1,518 test sentences with 3,644 named entities. The data is a collection of news wire articles made available by the Spanish EFE News Agency from May 2000. The annotation was carried out by the TALP Research Center of the Technical University of Catalonia (UPC) and the Center of Language and Computation (CLiC) of the University of Barcelona (UB), funded by the European Commission through the NAMIC project. The dataset is used for Named Entity Recognition and Classification in Spanish.

提供机构：

IIC

5,000+

优质数据集

54 个

任务类型

进入经典数据集