AI4PROFHEALTH - Profession-health status co-occurrence graph statistics
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14223004
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the Pointwise Mutual Information (PMI) values for co-occurrence pairs between different mention categories extracted from two distinct clinical datasets: MESINESP2 and the Clinical Case Reports Collection. PMI is a statistical measure used to assess the strength of association between pairs of entities by comparing their observed co-occurrence to the expected frequency under the assumption of independence.
The datasets include PMI values for each co-occurrence pair, derived from the association of professions and clinical concepts, with the aim of identifying potential occupational health risks. By sharing these datasets, we aim to support further research into the relationships between professions and clinical entities, enabling the development of more accurate and targeted occupational health risk models.
There is a separate file for each corpus, and each dataset is provided in CSV format for easy access and analysis. These files include the PMI values for co-occurrence pairs extracted from the respective corpora, making them suitable for further data analysis.
Data Structure:
MESINESP2: mesinesp2_co-occurrence_pmi.zip
Clinical case reports: clinical_cases_co-occurrence_pmi.zip
The repository contains a .zip file for each of the corpus, each containing a .csv file with the co-occurrences between the detected professions and clinical entities. The file has the following columns order:
span_mention_1: Mention string (original): profession
normalized_entity_1: Controlled vocabulary entry for this term
mention1_category: Semantic class (i.e., NER label)
mention1_freq: Absolute frequency of this mention entity 1
span_mention_2: Mention string (original): entity 2 (disease, symptom, species, etc.)
normalized_entity_2: Controlled vocabulary entry for this term
mention2_category: Semantic class (i.e., NER label)
mention1_freq: Absolute frequency of this mention entity 2
co-occurrence: Number of co-occurrences
PMID: PMID value
Notes
This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033).
Contact
If you have any questions or suggestions, please contact us at:
- Miguel Rodríguez Ortega ()- Martin Krallinger ()
Additional resources and corpora
If you are interested, you might want to check out these corpora and resources:
MEDDOPROF (Corpus of mentions of professions, occupations and working status and normalization, different document collection with some overlapping documents)
MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)
创建时间:
2024-12-02



