IIC/pharmaco-ner

Hugging Face2026-02-06 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/IIC/pharmaco-ner

下载链接

链接失效反馈

官方服务：

资源简介：

PharmaCoNER是一个手动分类的西班牙临床案例研究集合，来源于西班牙临床案例语料库（SPACCC）。该数据集包含396,988个单词和1,000个临床案例，随机分为训练集（500案例）、开发集（250案例）和测试集（250案例）。在训练示例方面，训练集包含8,129个标注句子，开发集包含3,787个，测试集包含3,952个。数据集包含4种实体类型：NORMALIZABLES、NO_NORMALIZABLES、PROTEINAS和UNCLEAR。数据集设计用于命名实体识别（NER）任务，语言为西班牙语。数据集的创建过程遵循了类似数据集的指南，并由领域专家进行标注。数据集不包含个人或敏感信息，旨在促进西班牙医学语言模型的发展。

PharmaCoNER is a manually classified collection of Spanish clinical case studies derived from the Spanish Clinical Case Corpus (SPACCC). The corpus contains a total of 396,988 words and 1,000 clinical cases that have been randomly sampled into 3 subsets: training set (500 cases), development set (250 cases), and test set (250 cases). In terms of training examples, this translates to a total of 8,129, 3,787, and 3,952 annotated sentences in each set respectively. It includes the following 4 entity types: NORMALIZABLES, NO_NORMALIZABLES, PROTEINAS, and UNCLEAR. This dataset was designed for the Named Entity Recognition (NER) task and is in Spanish. The annotation of the entire set of entity mentions was carried out by domain experts. The dataset does not contain personal or sensitive information and aims to contribute to the development of medical language models in Spanish.

提供机构：

IIC

5,000+

优质数据集

54 个

任务类型

进入经典数据集