five

Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) Corpus and word-embeddings

收藏
DataCite Commons2025-11-12 更新2025-04-10 收录
下载链接:
https://edatos.consorciomadrono.es/citation?persistentId=doi:10.21950/1YCBDV
下载链接
链接失效反馈
官方服务:
资源简介:
<p>The NLPMedTerm project aims at providing the research community with resources for natural language processing (NLP) in the health domain for Spanish. A Work Package of the project is a corpus texts annotated with medical entities as a resource for experiments in Named Entity Recognition. The corpus is aimed at training machine-learning models incorporating state-of-the-art neural network approaches. Another Work Package are the word embeddings from the medical domain.</p> <p> The Clinical Trials for Evidence-Based-Medicine in Spanish (CT-EBM-SP) corpus is a collection of 1200 texts about clinical trials studies and clinical trials announcements: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO) - 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos The word-embeddings were trained with fastText and using the following parameters: skipgram model, window size = 10, dimensions = 100, minimum frequency = 1, number of negatives sampled = 10, learning rate = 1e-4. We used texts from the European Medicines Agency corpus (∼13.9M tokens) and articles from the Scientific Electronic Library Online (SciELO) repository (∼25M tokens)</p>
提供机构:
e-cienciaDatos
创建时间:
2022-03-11
二维码
社区交流群
二维码
科研交流群
商业服务