five

AI4PROFHEALTH - Profession-health status co-occurrence graph statistics

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14223004
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the Pointwise Mutual Information (PMI) values for co-occurrence pairs between different mention categories extracted from two distinct clinical datasets: MESINESP2 and the Clinical Case Reports Collection. PMI is a statistical measure used to assess the strength of association between pairs of entities by comparing their observed co-occurrence to the expected frequency under the assumption of independence. The datasets include PMI values for each co-occurrence pair, derived from the association of professions and clinical concepts, with the aim of identifying potential occupational health risks. By sharing these datasets, we aim to support further research into the relationships between professions and clinical entities, enabling the development of more accurate and targeted occupational health risk models. There is a separate file for each corpus, and each dataset is provided in CSV format for easy access and analysis. These files include the PMI values for co-occurrence pairs extracted from the respective corpora, making them suitable for further data analysis. Data Structure: MESINESP2: mesinesp2_co-occurrence_pmi.zip Clinical case reports: clinical_cases_co-occurrence_pmi.zip The repository contains a .zip file for each of the corpus, each containing a .csv file with the co-occurrences between the detected professions and clinical entities. The file has the following columns order: span_mention_1: Mention string (original): profession normalized_entity_1: Controlled vocabulary entry for this term mention1_category: Semantic class (i.e., NER label) mention1_freq: Absolute frequency of this mention entity 1 span_mention_2: Mention string (original): entity 2 (disease, symptom, species, etc.) normalized_entity_2: Controlled vocabulary entry for this term mention2_category: Semantic class (i.e., NER label) mention1_freq: Absolute frequency of this mention entity 2 co-occurrence: Number of co-occurrences PMID: PMID value Notes This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033). Contact If you have any questions or suggestions, please contact us at: - Miguel Rodríguez Ortega ()- Martin Krallinger () Additional resources and corpora If you are interested, you might want to check out these corpora and resources: MEDDOPROF (Corpus of mentions of professions, occupations and working status and normalization, different document collection with some overlapping documents) MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)
创建时间:
2024-12-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作