five

Comparing mixed models and Random Forest association tests using naturalGWAS and a Striped Bass SNP dataset

收藏
DataCite Commons2025-06-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.zw3r22872
下载链接
链接失效反馈
官方服务:
资源简介:
In this study, we used the phenotype simulation package naturalGWAS to test the performance of Zhao’s Random Forest method in comparison to an uncorrected Random Forest test, latent factor mixed models (LFMM), genome-wide efficient mixed models (GEMMA), and confounder adjusted linear regression (CATE). We created 400 sets of phenotypes, corresponding to five effect sizes and 2, 5, 15, or 30 causal loci, simulated from two empirical datasets containing SNPs from Striped Bass representing three and 13 populations. All association methods were evaluated for their ability to detect genotype-phenotype associations based on power, false discovery rates, and number of false positives. Genomic inflation was highest for uncorrected Random Forest and LFMM tests and lowest for Gemma and Zhao’s Random Forest. All association tests had similar power to detect causal loci, and Zhao’s Random Forest had the lowest false discovery rate in all scenarios. To measure the performance of association tests in small datasets with few loci surrounding a causal gene we also ran analyses again after removing causal loci from each dataset. All association tests were only able to find true positives, defined as loci located within 30k bp of a causal locus, in 3%–18% of simulations. In contrast, at least one false positive was found in 17%–44% of simulations. Zhao’s Random Forest again identified the fewest false positives of all association tests studied. The ability to test the power of association tests for individual empirical datasets can be an extremely useful first step when designing a GWAS study.
提供机构:
Dryad
创建时间:
2022-08-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作