five

AI4PROFHEALTH - Automatic Occupations Gazetteer and Occupations Co-occurrence with Clinical Concepts

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14205071
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset comprises an occupations gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports. In addition, this dataset also includes the co-occurrences among occupations, and of professions with other clinical concepts that have been extracted automatically, including diseases, procedures, symptoms, species, drugs, and neoplasia morphologies.  The repository contains a .zip file for all the results obtained from Mesinesp2 and another for the clinical cases, both containing a .tsv file for the professions gazetteer, one for the professions internal co-occurrence, and one for the professions co-occurrence with other semantic classes: mesinesp2_profession_gazetteer_and_cooccurrence.zip (Mesinesp2) mesinesp2_professions_gazetteer.tsv mesinesp2_professions_cooccurrences.tsv mesinesp2_professions_cooccurrences_with_other_classes.tsv clinicalcases_profession_gazetteer_and_cooccurrence.zip (clinical cases) clinicalcases_professions_gazetteer.tsv clinicalcases_professions_cooccurrences.tsv clinicalcases_professions_cooccurrences_with_other_classes.tsv The gazetteer is divided into two columns, one with the name of the extracted terms and the other one with their total count. The professions co-occurrences .tsv file is divided into three columns. The first two columns contain the professions that co-occur, and the third column, named "count," indicates the number of co-occurrences calculated at the document level for each pair of detected entities. The professions co-occurrences with other classes .tsv file is structured in a similar manner, where the first column contains professions, the second one the clinical concepts that they co-occur with, the third one the counts of co-occurrences, and the fourth one the class to which the clinical concept belongs. License This work is licensed under a Creative Commons Attribution 4.0 International License. Contact If you have any questions or suggestions, please contact us at: - Sergi Marsol Torrent ()- Martin Krallinger () Additional resources and corpora If you are interested, you might want to check out these corpora and resources: MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection) MEDDOPROF corpus  Annotation Guidelines Acknowledgements This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033).
创建时间:
2024-12-12
二维码
社区交流群
二维码
科研交流群
商业服务