five

CodiEsp-abstracs: Abstracts from Lilacs and Ibecs with ICD10 codes

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3606625
下载链接
链接失效反馈
官方服务:
资源简介:
JSON file with abstracts from Lilacs and Ibecs with ICD10 codes (ICD10-CM and ICD10-PCS) associated to them (CIE10 in Spanish).   Please, cite us: Miranda-Escalada, A., Gonzalez-Agirre, A., Armengol-Estapé, J., Krallinger, M.: Overview of automatic clinical coding: annotations, guidelines, and solutions for non-English clinical cases at CodiEsp track of eHealth CLEF 2020. In: CLEF (Working Notes) (2020) @inproceedings{miranda2020overview, title={Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of CLEF eHealth 2020}, author={Miranda-Escalada, Antonio and Gonzalez-Agirre, Aitor and Armengol-Estap{\'e}, Jordi and Krallinger, Martin}, booktitle={Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings}, year={2020} }   Lilacs and Ibecs databases have MeSH terms describing some of their documents. Then, using UMLS Metathesaurus, those MeSH terms have been translated into ICD10 codes (ICD10-CM and ICD10-PCS). Every abstract have at least one ICD10 code.  In addition, MeSH codes given by the databases (Lilacs and Ibecs) have a "word" describing them. These "words" have been used to add further ICD10 codes. We have done strict string matching to find whether those "words" were a descriptor of any ICD10 code (in the Spanish version, CIE10). The format of the JSON file is the following: {'articles': [{'title': 'title', 'pmid': 'pmid', 'abstractText': 'abtract (in Spanish)', 'Mesh': [{'Code': 'MeSHCode', 'Word': 'reference', 'CIE': [CIE10_1, CIE10_2, ...]}, ...] }, ...] }   Additionally, the compressed file includes a folder with all the abstracts extracted in individual UTF-8 encoded text files and a tab-separated file with 4 fields: pmid    label    cie10-code    word Summary statistics: number of abstracts: 355 840 number abstracts with at least one ICD10 code: 176 294 Percentage of MeSH codes mapped to ICD10: 10.6% (there were 2 526 772 MeSH codes and 266 949 mapped to ICD10) average number of MeSH codes per article: 7.1 average number of ICD10 codes per article: 2.5 number of ICD10 codes that have an associated MeSH code in UMLS: 3293 number of ICD10 codes that have an associated MeSH code in UMLS and appear in this dataset: 3082
创建时间:
2021-05-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作