Comparing mixed models and Random Forest association tests using naturalGWAS and a Striped Bass SNP dataset
收藏DataCite Commons2025-06-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.zw3r22872
下载链接
链接失效反馈官方服务:
资源简介:
In this study, we used the phenotype simulation package naturalGWAS to
test the performance of Zhao’s Random Forest method in comparison to an
uncorrected Random Forest test, latent factor mixed models (LFMM),
genome-wide efficient mixed models (GEMMA), and confounder adjusted linear
regression (CATE). We created 400 sets of phenotypes, corresponding to
five effect sizes and 2, 5, 15, or 30 causal loci, simulated from two
empirical datasets containing SNPs from Striped Bass representing three
and 13 populations. All association methods were evaluated for their
ability to detect genotype-phenotype associations based on power, false
discovery rates, and number of false positives. Genomic inflation was
highest for uncorrected Random Forest and LFMM tests and lowest for Gemma
and Zhao’s Random Forest. All association tests had similar power to
detect causal loci, and Zhao’s Random Forest had the lowest false
discovery rate in all scenarios. To measure the performance of association
tests in small datasets with few loci surrounding a causal gene we also
ran analyses again after removing causal loci from each dataset. All
association tests were only able to find true positives, defined as loci
located within 30k bp of a causal locus, in 3%–18% of simulations. In
contrast, at least one false positive was found in 17%–44% of simulations.
Zhao’s Random Forest again identified the fewest false positives of all
association tests studied. The ability to test the power of association
tests for individual empirical datasets can be an extremely useful first
step when designing a GWAS study.
提供机构:
Dryad
创建时间:
2022-08-29



