five

Results when predicting eQTL test data.

收藏
Figshare2025-06-10 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Results_when_predicting_eQTL_test_data_/29283545
下载链接
链接失效反馈
官方服务:
资源简介:
Genome-wide association studies (GWAS) have successfully uncovered numerous associations between genetic variants and disease traits to date. Yet, identifying significantly associated loci remains a considerable challenge due to the concomitant multiple-testing burden of performing such analyses genome-wide. Here, we leverage the genetic associations of molecular traits – DNA CpG-site methylation status and RNA expression – to mitigate this problem. We encode their co-association across the genome using PinSage, a graph convolutional neural network-based recommender system previously deployed at Pinterest. We demonstrate, using this framework, that a model trained only on methylation quantitative trait locus (QTL) data could recapitulate over half (554,209/1,021,052) of possible SNP-RNA associations identified in a large expression QTL meta-analysis. Taking advantage of a recent ‘saturated’ map of height associations, we then show that height-associated loci predicted by a model trained on molecular-QTL data replicated comparably, following Bonferroni correction, to those that were genome-wide significant in UK Biobank (88% compared to 91%). On a set of 64 disease outcomes in UK Biobank, the same model identified 143 independent novel disease associations, with at least one additional association for 64% (41/64) of the disease outcomes examined. Excluding associations involving the MHC region, we achieve a total uplift of over 8% (128/1,548). We successfully replicated 38% (39/103) of the novel disease associations in an independent sample, with suggestive evidence for six additional associations from GWAS Catalog. Replicated associations included for instance that between rs10774625 (nearest gene: SH2B3/ATXN2) and coeliac disease, and that between rs12350420 (nearest gene: MVB12B) and glaucoma. For many GWAS, attaining such an enhancement by simply increasing sample size may be prohibitively expensive, or impossible depending on disease prevalence.
创建时间:
2025-06-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作