IIC/CT-EBM-SP

Hugging Face2026-02-06 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/IIC/CT-EBM-SP

下载链接

链接失效反馈

官方服务：

资源简介：

CT-EBM-SP（西班牙语循证医学临床试验）语料库是一个西班牙语的临床医学试验文本集合，包含1200篇文本：500篇来自PubMed或SciELO的期刊摘要（采用Creative Commons许可发布），700篇来自欧洲临床试验注册库和西班牙临床试验库的公告。该语料库标注了统一医学语言系统(UMLS)实体，包括解剖结构(ANAT)、化学物质(CHEM)、病理状况(DISO)和治疗程序(PROC)四类，共计46,699个实体。数据集分为训练集（175,203个标记，28,101个实体）、开发集（58,670个标记，9,629个实体）和测试集（58,300个标记，8,969个实体）。该资源主要用于医学命名实体识别任务，但目前仍在开发中，不建议用于医疗决策。

The CT-EBM-SP (Clinical Trials for Evidence-Based Medicine in Spanish) corpus is a collection of 1,200 Spanish-language texts about clinical trials: 500 abstracts from journals published under a Creative Commons license (available in PubMed or SciELO) and 700 clinical trial announcements from the European Clinical Trials Register and the Spanish Clinical Trials Repository. The corpus is annotated with Unified Medical Language System (UMLS) entities, including four categories: Anatomy (ANAT), Chemicals (CHEM), Disorders (DISO), and Procedures (PROC), totaling 46,699 entities. The dataset is divided into training (175,203 tokens, 28,101 entities), development (58,670 tokens, 9,629 entities), and test sets (58,300 tokens, 8,969 entities). This resource is primarily intended for medical named entity recognition tasks but is still under development and not recommended for medical decision-making.

提供机构：

IIC

5,000+

优质数据集

54 个

任务类型

进入经典数据集