GWAS summary statistics for 9 quantitative phenotypes from the UK Biobank (5-fold cross-validation)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14270952
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains GWAS summary statistics for 9 quantitative phenotypes from the UK Biobank.
The dataset is designed to enable systematic PRS analyses with 5-fold cross validation. For each phenotype and fold, we provide GWAS summary statistics for the training, validation, and test sets. The validation summary statistics can be used for model selection/tuning. The test summary statistics can be used to evaluate PRS models via pseudo-validation metrics. Association testing for all phenotypes and samples was done with plink2.
The phenotypes included in this dataset are:
HEIGHT: Standing height (Data-Field: 50)
BMI: Body mass index (Data-Field: 21001)
WC: Waist circumference (Data-Field: 48)
HC: Hip circumference (Data-Field: 49)
BW: Birth weight (Data-Field: 20022)
FVC: Forced vital capacity (Data-Field: 3062)
FEV1: Forced expiratory volume in 1-second (Data-Field: 3063)
HDL: HDL cholesterol (Data-Field: 30760)
LDL: LDL cholesterol (Data-Field: 30780)
To allow users to assess PRS performance as a function of sample size, we also provide subsampled training GWAS summary statistics. This is done by taking the training samples and randomly selecting (without replacement) a subset of them for conducting association testing. The training sample sizes are:
N = 5000
N = 10000
N = 20000
N = 40000
N = 80000
N = 160000
Full training set (sample size varies by phenotype).
NOTE: Due to the smaller overall sample size for the Birth weight phenotype, we do not include training data for the `N=160000` setting.
The folder structure of the GWAS data for each phenotype is as follows:
train
N_5000
fold_1
chr_1.PHENO1.glm.linear
chr_2.PHENO1.glm.linear
...
fold_2
fold_3
...
N_10000
N_20000
N_40000
N_80000
N_160000
full
validation
fold_1
chr_1.PHENO1.glm.linear
chr_2.PHENO1.glm.linear
...
fold_2
fold_3
...
test
fold_1
fold_2
fold_3
...
For more details about the GWAS study, Quality Control (QC) criteria, or other information, please consult our publication:
Zabad, S., Gravel, S., & Li, Y. (2023). Fast and accurate Bayesian polygenic risk modeling with variational inference. The American Journal of Human Genetics, 110(5), 741–761. https://doi.org/10.1016/j.ajhg.2023.03.009
If you use this data in your work, please cite the publication above.
创建时间:
2025-02-06



