five

Spanish Biomedical Corpus

收藏
DataCite Commons2025-04-01 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/7btd42m2sc
下载链接
链接失效反馈
官方服务:
资源简介:
Embeddings This repository contains the word embeddings generated from biomedical Spanish texts corpora. Corpus detail The corpus was gathered from Spanish biomedical texts from different multilingual biomedical sources: IBECS (Spanish Bibliographical Index in Health Sciences): corpus that collects scientific journals covering multiple fields in health sciences. Contains titles and abstracts from 168,198 records in English and Spanish. SciELO (Scientific Electronic Library Online): corpus gathers electronic publications of complete full-text articles from scientific journals of Latin America, South Africa, and Spain. Contains titles and abstracts from 161,710 records in English and Spanish. Pubmed: free search engine used to access the MedlineNLM (https://www.ncbi.nlm.nih.gov/pubmed/). Contains titles and abstracts from 127,619 records. MedlinePlus: corpus with health topics, drugs and supplements, laboratory test information, and medical encyclopedia texts contains 7,033 articles in English and Spanish. UFAL Medical Corpus is a collection of parallel corpora of medical and general domain texts. All corpus data files can be found in the next link: http://temu.bsc.es/mespen/ Pre-trained Models FastText We used the FastText (Bojanowski et al., 2016) implementation to training our word embeddings using the preprocessed Spanish Biomedical corpus (FastText-SBC). Moreover, we trained a concept embedding model replacing biomedical concepts in the Spanish Biomedical corpus with their unique SNOMED-CT Spanish Edition iden-tifier (SNOMED-SBC). We used the PyMedTer-mino library (Lamy et al., 2015) for concept indexing using full-text search and fuzzy search with threshold. Train Parameters Dimension = 300 epoch=10,20 min_count=20 neg=20 t=6e-5 thread=7 encoding='utf8' min subword-ngram = 3 max subword-ngram = 6 Links to the embeddings FastText-SBC, epoch 10 FastText-SBC, epoch 20 SNOMED-SBC
提供机构:
Mendeley
创建时间:
2021-03-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作