five

On the cross-population generalizability of gene expression prediction models

收藏
DataONE2020-08-06 更新2025-07-19 收录
下载链接:
https://search.dataone.org/view/sha256:3b881c372975748f3692ce9989327bf0dc3541fb5a9efbe9f5a98b699e4fed17
下载链接
链接失效反馈
官方服务:
资源简介:
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVA...

基因表达的遗传调控是人类生理学的核心组成部分。过往数年间,转录组关联研究(Transcriptome-wide Association Study, TWAS)通过整合基因型与RNA测序的关联大型数据集,构建了一套高效的基于基因的关联检验方法,并已在数十项研究中得到应用。尽管该领域已取得诸多突破性发现,但训练数据集所用的研究对象绝大多数为欧洲血统人群,目前对于这些模型在其他人群中的泛化能力仍所知有限。本研究利用一份包含全血RNA测序数据的非裔美国人数据集,检验基因表达预测模型的跨人群泛化能力。研究发现,基于GTEx、DGN等大型数据集训练的默认模型在非裔美国人人群中表现欠佳,相较于欧洲裔美国人人群,其预测精度出现显著降低。本研究还借助GEUVA数据集涵盖的五个人群,验证了这类模型在跨人群泛化能力上的上述局限……
创建时间:
2025-06-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作