MeSDiCon - Medical Spanish Disease and symptom name Collection lexicon (unfiltered initial version)
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/3558270
下载链接
链接失效反馈官方服务:
资源简介:
The MeSDiCon - (Medical Spanish Disease and symptom name Collection lexicon) consists of a list or gazetteer of candidate names of diseases and symptoms mentioned in Spanish clinical texts. Thus MeSDiCon serves as a lexical resource or dictionary for automatic detection of disease/symptom mentions, as well as indexing or classification of medical texts with such concept types.
This collection was generated in a five step procedure:
Automatic detection of mentions of disease/symptom terms in biomedical texts in English (including mapping/normalization to MeSH terms or OMIM identifiers).
Generation of a unique name list from the detected concept mentions.
Basic filtering of non- disease/symptom names or highly ambiguous mentions-abbreviations using basic characteristics like name morphology and length criteria.
Automatic translation of name lists form English to Spanish using a medical machine translation system (see Soares, F. and Krallinger, M. BSC Participation in the WMT Translation of Biomedical Abstracts. In Proceedings of the Fourth Conference on Machine Translation, Volume 3: Shared Task Papers, pp. 175-178 2019; https://zenodo.org/record/3346802)
Automatic mention lookup of translated names in a collection of 20 million Spanish clinical notes (primary care and pediatrics).
Every term in MeSDiCon is identified by a text span (in Spanish), a target terminology namespace to which it was automatically mapped (MeSH or OMIM) and its corresponding concept identifier in that target terminology. Moreover, we provide for every text span the absolute term frequency, i.e. the number of matches in the corpus of 20 million clinical notes and the number of documents or notes in which it was automatically.
Important note: no manual filtering of the MeSDiCon was carried out, implying that some entries might comprise errors, either due to the initial name recognition and concept mapping in English or due to wrong automatic translations into Spanish.
The MeSDiCon resource is provided in two formats:
TSV. Data is separated by tabs (\t). Every row of the file has the following fields:
terminology identifier translatedTerm termCount documentCount
JSON. Records are stored as a list of JSON objects. They have the following fields:
{
"terminology":"MESH",
"identifier":"D025861",
"translatedTerm":"Trastornos de la coagulación",
"termFrequency":9,
"documentFrequency":9
}
Copyright (c) 2019 Secretaría de Estado para el Avance Digital
创建时间:
2022-11-05



