IIC/ClinText-SP
收藏Hugging Face2025-03-24 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/IIC/ClinText-SP
下载链接
链接失效反馈官方服务:
资源简介:
ClinText-SP是迄今为止公开可用的最大的西班牙语临床语料库,旨在支持临床自然语言处理研究。该语料库汇集了来自不同公开来源的丰富的临床文本,包括医学期刊、共享任务的注释语料库以及如维基百科和医学教科书等补充来源。数据集包含35,996个样本,平均每个样本约700个标记,总共有约25.62M个标记。ClinText-SP提供了长篇结构良好的临床病例报告和较短的简略文本的平衡混合,适合各种临床NLP任务。
ClinText-SP is the largest publicly available Spanish clinical corpus designed to support research in clinical natural language processing. It aggregates a rich collection of clinical texts from diverse open sources, including medical journals, annotated corpora from shared tasks, and supplementary sources like Wikipedia and medical textbooks. The dataset contains 35,996 samples with an average of ~700 tokens per sample and approximately 25.62M tokens in total. ClinText-SP offers a balanced mix of long, well-structured clinical case reports and shorter, schematic texts, making it ideal for a variety of clinical NLP tasks.
提供机构:
IIC



