Optimizing sampling design for landscape genomics

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.63xsj3v8s

下载链接

链接失效反馈

官方服务：

资源简介：

Landscape genomic approaches for detecting genotype-environment associations (GEA), isolation by distance (IBD), and isolation by environment (IBE) have seen a dramatic increase in use, but there have been few thorough analyses of the influence of sampling strategy on their performance under realistic genomic and environmental conditions. We simulated 24,000 datasets across a range of scenarios with complex population dynamics and realistic landscape structure to evaluate the effects of the spatial distribution and number of samples on common landscape genomics methods. Our results show that common analyses are relatively robust to sampling scheme as long as sampling covers enough environmental and geographic space. We found that for detecting adaptive loci and estimating IBE, sampling schemes that were explicitly designed to increase coverage of available environmental space matched or outperformed sampling schemes that only considered geographic space. When sampling does not cover adequate geographic and environmental space, such as with transect-based sampling, we detected fewer adaptive loci and had higher error when estimating IBD and IBE. We found that IBD could be detected with as few as nine sampling sites, while large sample sizes (e.g., greater than 100 individuals) were crucial for detecting adaptive loci and IBE. We also demonstrate that, even with optimal sampling strategies, landscape genomic analyses are highly sensitive to landscape structure and migration ⁠— when spatial autocorrelation and migration are weak, common GEA methods fail to detect adaptive loci. Methods This dataset was generated from simulations run in Python version 3.9.7 (Van Rossum & Drake, 2009) using Geonomics version 1.3.9 (Terasaki Hart et al., 2021). We ran simulations varying population size, migration rate, selection strength, spatial autocorrelation, and environmental correlation, each at a “low” and “high” level. We ran 10 replications of each simulation to capture variation in results due to stochasticity. Together with three sets of simulated landscapes, this produced a total of 960 simulations (30 repetitions of each of 32 unique parametrizations). This dataset contains a compressed tarball (.tar.gz) with 960 pairs of CSV files and Variant Call Format (VCF) files with genomic data for each of the 960 simulations. A complete description of the methods used to collect and process this dataset is available in the corresponding paper (Bishop et al., 2024). The corresponding code used to create these simulations is archived on Zenodo (DOI 10.5281/zenodo.14009716).

创建时间：

2024-11-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集