CodiEsp-abstracs: Abstracts from Lilacs and Ibecs with ICD10 codes
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3606625
下载链接
链接失效反馈官方服务:
资源简介:
JSON file with abstracts from Lilacs and Ibecs with ICD10 codes (ICD10-CM and ICD10-PCS) associated to them (CIE10 in Spanish).
Please, cite us:
Miranda-Escalada, A., Gonzalez-Agirre, A., Armengol-Estapé, J., Krallinger, M.: Overview of automatic clinical coding: annotations, guidelines, and solutions for non-English clinical cases at CodiEsp track of eHealth CLEF 2020. In: CLEF (Working Notes) (2020)
@inproceedings{miranda2020overview,
title={Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of CLEF eHealth 2020},
author={Miranda-Escalada, Antonio and Gonzalez-Agirre, Aitor and Armengol-Estap{\'e}, Jordi and Krallinger, Martin},
booktitle={Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings},
year={2020} }
Lilacs and Ibecs databases have MeSH terms describing some of their documents. Then, using UMLS Metathesaurus, those MeSH terms have been translated into ICD10 codes (ICD10-CM and ICD10-PCS). Every abstract have at least one ICD10 code.
In addition, MeSH codes given by the databases (Lilacs and Ibecs) have a "word" describing them. These "words" have been used to add further ICD10 codes. We have done strict string matching to find whether those "words" were a descriptor of any ICD10 code (in the Spanish version, CIE10).
The format of the JSON file is the following:
{'articles':
[{'title': 'title',
'pmid': 'pmid',
'abstractText': 'abtract (in Spanish)',
'Mesh':
[{'Code': 'MeSHCode',
'Word': 'reference',
'CIE': [CIE10_1, CIE10_2, ...]},
...]
},
...]
}
Additionally, the compressed file includes a folder with all the abstracts extracted in individual UTF-8 encoded text files and a tab-separated file with 4 fields:
pmid label cie10-code word
Summary statistics:
number of abstracts: 355 840
number abstracts with at least one ICD10 code: 176 294
Percentage of MeSH codes mapped to ICD10: 10.6% (there were 2 526 772 MeSH codes and 266 949 mapped to ICD10)
average number of MeSH codes per article: 7.1
average number of ICD10 codes per article: 2.5
number of ICD10 codes that have an associated MeSH code in UMLS: 3293
number of ICD10 codes that have an associated MeSH code in UMLS and appear in this dataset: 3082
创建时间:
2021-05-07



