Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)
收藏Figshare2019-02-13 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Using_topic_modeling_via_non-negative_matrix_factorization_to_identify_relationships_between_genetic_variants_and_disease_phenotypes_A_case_study_of_Lipoprotein_a_i_LPA_i_/7715624
下载链接
链接失效反馈官方服务:
资源简介:
Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P LPA and a topic enriched for lung cancer (P
创建时间:
2019-02-13



