Clinical Connections KG version 20240501
收藏Figshare2024-12-22 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Clinical_Connections_KG_version_20240501/28079582
下载链接
链接失效反馈官方服务:
资源简介:
The Clinical Connections KG is created and maintained by the Multiomics Provider team from the Institute for Systems Biology in Seattle, WA. This KP provides a knowledge graph pointing from risk factors to a variety health outcomes (diseases, phenotypes, medication exposure). We use data from over 28 million Electronic Health Records (EHRs) to train a large collection of interpretable machine learning models which are integrated into a single large knowledge graph. The edges of the graph are generated by running ~300 logistic regression models for clinical conditions with features including age, sex, medical conditions and medications as nodes to predict associations with disease outcome.The Data consist of over 28 million EHR records Providence Health Systems and Affiliates (PSHA), which cares for patients through 51 hospitals and 1085 clinics across seven states in the US, including Alaska, California, Montana, New Mexico, Oregon, Texas, and Washington.The KG includes results from 152 multivariate logistic regression models, which includes 152 conditions, 335 medications, 115 lab measurements, and 5 demographic features. Log odds ratios are used to quantify associations between concepts. The AUROC for each model is provided, along with the 95% confidence intervals and p-values for each association.Features are indicated by a binary (0/1) for whether or not they are present in a person's medical history. Laboratory features are coded as high/low relative to the reference range at the time it was entered into the EHR. The specification of (1,0) or (0,1) indicates the lab result was high or low, respectively, while "normal" (as defined by the reference ranges) or the absence of lab result are mapped to (0,0). Laboratory values that were split into high or low were then mapped from LOINC codes to HPO phenotypes. Demographic features include age groups (0-17, 18-49, 50-74, and 75+ years old), sex (Female = 0), and ethnic group (Hispanic or Latino = 1).Graph PropertiesDisease nodes use Monarch Disease Ontology (MONDO) or Human Phenotype Ontology (HPO) identifiers, depending on the nature of the disease.Medication nodes use CHEMBL or CHEBI identifiers, depending on the nature of the medication.Laboratory results use the LOINC2HPO tool to map LOINC codes to HPO identifiers.Edge predicates are "associated_with_increased_likelihood_of" if the coefficient is positive and "associated_with_decreased_likelihood_of" if the coefficient is negative.Example edge (interpretation): The KG shows that rosuvastatin is associated with an increased likelihood of chronic ischemic heart disease, with a log odds ratio of 3.4278 and a p value of
创建时间:
2024-12-22



