AI4PROFHEALTH - Automatic Occupations Gazetteer and Occupations Co-occurrence with Clinical Concepts
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14205071
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises an occupations gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports. In addition, this dataset also includes the co-occurrences among occupations, and of professions with other clinical concepts that have been extracted automatically, including diseases, procedures, symptoms, species, drugs, and neoplasia morphologies.
The repository contains a .zip file for all the results obtained from Mesinesp2 and another for the clinical cases, both containing a .tsv file for the professions gazetteer, one for the professions internal co-occurrence, and one for the professions co-occurrence with other semantic classes:
mesinesp2_profession_gazetteer_and_cooccurrence.zip (Mesinesp2)
mesinesp2_professions_gazetteer.tsv
mesinesp2_professions_cooccurrences.tsv
mesinesp2_professions_cooccurrences_with_other_classes.tsv
clinicalcases_profession_gazetteer_and_cooccurrence.zip (clinical cases)
clinicalcases_professions_gazetteer.tsv
clinicalcases_professions_cooccurrences.tsv
clinicalcases_professions_cooccurrences_with_other_classes.tsv
The gazetteer is divided into two columns, one with the name of the extracted terms and the other one with their total count.
The professions co-occurrences .tsv file is divided into three columns. The first two columns contain the professions that co-occur, and the third column, named "count," indicates the number of co-occurrences calculated at the document level for each pair of detected entities. The professions co-occurrences with other classes .tsv file is structured in a similar manner, where the first column contains professions, the second one the clinical concepts that they co-occur with, the third one the counts of co-occurrences, and the fourth one the class to which the clinical concept belongs.
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contact
If you have any questions or suggestions, please contact us at:
- Sergi Marsol Torrent ()- Martin Krallinger ()
Additional resources and corpora
If you are interested, you might want to check out these corpora and resources:
MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)
MEDDOPROF corpus
Annotation Guidelines
Acknowledgements
This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033).
创建时间:
2024-12-12



