five

Bayesian Double Feature Allocation for Phenotyping With Electronic Health Records

收藏
DataCite Commons2021-09-29 更新2024-08-17 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Bayesian_Double_Feature_Allocation_for_Phenotyping_with_Electronic_Health_Records/10115876
下载链接
链接失效反馈
官方服务:
资源简介:
Electronic health records (EHR) provide opportunities for deeper understanding of human phenotypes—in our case, latent disease—based on statistical modeling. We propose a categorical matrix factorization method to infer latent diseases from EHR data. A latent disease is defined as an unknown biological aberration that causes a set of common symptoms for a group of patients. The proposed approach is based on a novel double feature allocation model which simultaneously allocates features to the rows and the columns of a categorical matrix. Using a Bayesian approach, available prior information on known diseases (e.g., hypertension and diabetes) greatly improves identifiability and interpretability of the latent diseases. We assess the proposed approach by simulation studies including mis-specified models and comparison with sparse latent factor models. In the application to a Chinese EHR dataset, we identify 10 latent diseases, each of which is shared by groups of subjects with specific health traits related to lipid disorder, thrombocytopenia, polycythemia, anemia, bacterial and viral infections, allergy, and malnutrition. The identification of the latent diseases can help healthcare officials better monitor the subjects’ ongoing health conditions and look into potential risk factors and approaches for disease prevention. We cross-check the reported latent diseases with medical literature and find agreement between our discovery and reported findings elsewhere. We provide an R package “dfa” implementing our method and an R shiny web application reporting the findings. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

电子健康记录(Electronic Health Records,EHR)可助力基于统计建模对人类表型——在本研究中即为潜在疾病——展开更深入的解析。本研究提出一种分类矩阵分解方法,用于从EHR数据中推断潜在疾病。潜在疾病被定义为一种未知的生物学异常,可引发某组患者的一系列共同症状。所提方法基于一种全新的双特征分配模型,可同时将特征分配至分类矩阵的行与列。借助贝叶斯方法,已知疾病(如高血压、糖尿病)的可用先验信息可显著提升潜在疾病的可识别性与可解释性。本研究通过仿真实验对所提方法进行评估,其中涵盖模型误设场景,并与稀疏潜在因子模型展开对比。在应用于中国人群EHR数据集时,本研究共识别出10种潜在疾病,每一种均对应一组具有特定健康特征的受试者,这些特征与脂质代谢紊乱、血小板减少症、红细胞增多症、贫血、细菌及病毒感染、过敏及营养不良相关。潜在疾病的识别可助力医疗管理人员更好地监测受试者的实时健康状况,并深入探究潜在风险因素与疾病预防策略。本研究将所识别的潜在疾病与医学文献进行交叉验证,发现本研究的发现与其他文献的报道结果一致。本研究提供了实现该方法的R包"dfa",以及用于展示研究结果的R Shiny网页应用。本文的补充材料(包括可用于复现研究的标准化材料说明)可作为在线补充材料获取。
提供机构:
Taylor & Francis
创建时间:
2019-10-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作