On the cross-population generalizability of gene expression prediction models
收藏DataCite Commons2026-03-04 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.7272/Q6RN362Z
下载链接
链接失效反馈官方服务:
资源简介:
The genetic control of gene expression is a core component of human
physiology. For the past several years, transcriptome-wide association
studies have leveraged large datasets of linked genotype and RNA
sequencing information to create a powerful gene-based test of association
that has been used in dozens of studies. While numerous discoveries have
been made, the populations in the training data are overwhelmingly of
European descent, and little is known about the generalizability of these
models to other populations. Here, we test for cross-population
generalizability of gene expression prediction models using a dataset of
African American individuals with RNA-Seq data in whole blood. We find
that the default models trained in large datasets such as GTEx and DGN
fare poorly in African Americans, with a notable reduction in prediction
accuracy when compared to European Americans. We replicate these
limitations in cross-population generalizability using the five
populations in the GEUVADIS dataset. Via realistic simulations of both
populations and gene expression, we show that accurate cross-population
generalizability of transcriptome prediction only arises when eQTL
architecture is substantially shared across populations. In contrast,
models with non-identical eQTLs showed patterns similar to real-world
data. Therefore, generating RNA-Seq data in diverse populations is a
critical step towards multi-ethnic utility of gene expression prediction.
提供机构:
Dryad
创建时间:
2020-08-06



