Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) Corpus and word-embeddings

Name: Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) Corpus and word-embeddings
Creator: e-cienciaDatos
Published: 2025-11-12 09:19:13
License: 暂无描述

DataCite Commons2025-11-12 更新2025-04-10 收录

下载链接：

https://edatos.consorciomadrono.es/citation?persistentId=doi:10.21950/1YCBDV

下载链接

链接失效反馈

官方服务：

资源简介：

<p>The NLPMedTerm project aims at providing the research community with resources for natural language processing (NLP) in the health domain for Spanish. A Work Package of the project is a corpus texts annotated with medical entities as a resource for experiments in Named Entity Recognition. The corpus is aimed at training machine-learning models incorporating state-of-the-art neural network approaches. Another Work Package are the word embeddings from the medical domain.</p> <p> The Clinical Trials for Evidence-Based-Medicine in Spanish (CT-EBM-SP) corpus is a collection of 1200 texts about clinical trials studies and clinical trials announcements: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO) - 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos The word-embeddings were trained with fastText and using the following parameters: skipgram model, window size = 10, dimensions = 100, minimum frequency = 1, number of negatives sampled = 10, learning rate = 1e-4. We used texts from the European Medicines Agency corpus (∼13.9M tokens) and articles from the Scientific Electronic Library Online (SciELO) repository (∼25M tokens)</p>

提供机构：

e-cienciaDatos

创建时间：

2022-03-11