Comparing mixed models and Random Forest association tests using naturalGWAS and a Striped Bass SNP dataset

Name: Comparing mixed models and Random Forest association tests using naturalGWAS and a Striped Bass SNP dataset
Creator: Dryad
Published: 2025-06-01 03:16:50
License: 暂无描述

DataCite Commons2025-06-01 更新2025-04-10 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.zw3r22872

下载链接

链接失效反馈

官方服务：

资源简介：

In this study, we used the phenotype simulation package naturalGWAS to test the performance of Zhao’s Random Forest method in comparison to an uncorrected Random Forest test, latent factor mixed models (LFMM), genome-wide efficient mixed models (GEMMA), and confounder adjusted linear regression (CATE). We created 400 sets of phenotypes, corresponding to five effect sizes and 2, 5, 15, or 30 causal loci, simulated from two empirical datasets containing SNPs from Striped Bass representing three and 13 populations. All association methods were evaluated for their ability to detect genotype-phenotype associations based on power, false discovery rates, and number of false positives. Genomic inflation was highest for uncorrected Random Forest and LFMM tests and lowest for Gemma and Zhao’s Random Forest. All association tests had similar power to detect causal loci, and Zhao’s Random Forest had the lowest false discovery rate in all scenarios. To measure the performance of association tests in small datasets with few loci surrounding a causal gene we also ran analyses again after removing causal loci from each dataset. All association tests were only able to find true positives, defined as loci located within 30k bp of a causal locus, in 3%–18% of simulations. In contrast, at least one false positive was found in 17%–44% of simulations. Zhao’s Random Forest again identified the fewest false positives of all association tests studied. The ability to test the power of association tests for individual empirical datasets can be an extremely useful first step when designing a GWAS study.

提供机构：

Dryad

创建时间：

2022-08-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集