five

GWAS summary statistics for 9 quantitative phenotypes from the UK Biobank (5-fold cross-validation)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14270952
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains GWAS summary statistics for 9 quantitative phenotypes from the UK Biobank. The dataset is designed to enable systematic PRS analyses with 5-fold cross validation. For each phenotype and fold, we provide GWAS summary statistics for the training, validation, and test sets. The validation summary statistics can be used for model selection/tuning. The test summary statistics can be used to evaluate PRS models via pseudo-validation metrics. Association testing for all phenotypes and samples was done with plink2.   The phenotypes included in this dataset are: HEIGHT: Standing height (Data-Field: 50) BMI: Body mass index (Data-Field: 21001) WC: Waist circumference (Data-Field: 48) HC: Hip circumference (Data-Field: 49) BW: Birth weight (Data-Field: 20022) FVC: Forced vital capacity (Data-Field: 3062) FEV1: Forced expiratory volume in 1-second (Data-Field: 3063) HDL: HDL cholesterol (Data-Field: 30760) LDL: LDL cholesterol (Data-Field: 30780)   To allow users to assess PRS performance as a function of sample size, we also provide subsampled training GWAS summary statistics. This is done by taking the training samples and randomly selecting (without replacement) a subset of them for conducting association testing. The training sample sizes are: N = 5000 N = 10000 N = 20000 N = 40000 N = 80000 N = 160000 Full training set (sample size varies by phenotype). NOTE: Due to the smaller overall sample size for the Birth weight phenotype, we do not include training data for the `N=160000` setting. The folder structure of the GWAS data for each phenotype is as follows: train N_5000  fold_1 chr_1.PHENO1.glm.linear chr_2.PHENO1.glm.linear ... fold_2 fold_3 ... N_10000 N_20000 N_40000 N_80000 N_160000 full validation fold_1 chr_1.PHENO1.glm.linear chr_2.PHENO1.glm.linear ... fold_2 fold_3 ... test fold_1 fold_2 fold_3 ... For more details about the GWAS study, Quality Control (QC) criteria, or other information, please consult our publication: Zabad, S., Gravel, S., & Li, Y. (2023). Fast and accurate Bayesian polygenic risk modeling with variational inference. The American Journal of Human Genetics, 110(5), 741–761. https://doi.org/10.1016/j.ajhg.2023.03.009 If you use this data in your work, please cite the publication above.
创建时间:
2025-02-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作