five

Data from: Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference

收藏
Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/5012373
下载链接
链接失效反馈
官方服务:
资源简介:
Restriction site-associated DNA sequencing (RAD-seq) provides high-resolution population genomic data at low cost, and has become an important component in ecological and evolutionary studies. As with all high-throughput technologies, analytic strategies require critical validation to ensure accurate and unbiased interpretation. To test for the impact of bioinformatic data processing on downstream population genetic inferences, we analysed mammalian RAD-seq data (>100 individuals) with 312 combinations of methodology (de novo vs. mapping to references of increasing divergence) and filtering criteria (missing data, HWE, FIS, coverage, mapping, genotype quality). In an effort to identify commonalities and biases in all pipelines, we computed summary statistics (nr. loci, nr. SNP, π, Hetobs, FIS, FST, Ne, m) and compared the results to independent null expectations (isolation-by-distance correlation, expected transition-to-transversion ratio Ts/Tv, Mendelian mismatch rates of known parent-offspring trios). We observed large differences between reference-based and de novo approaches, the former generally calling more SNPs and reducing FIS and Ts/Tv. Data completion levels showed little impact on most summary statistics, and FST estimates were robust across all pipelines. The site-frequency spectrum (SFS) was highly sensitive to the chosen approach as reflected in large variance of parameter estimates across demographic scenarios (single-population bottlenecks and isolation-with-migration model). Null-expectations were best met by reference-based approaches, though contingent on the specific criteria. We recommend RAD-seq studies employ reference-based approaches to a closely related genome, and due to the high stochasticity associated with the pipeline advocate the use of multiple pipelines to ensure robust population genetic and demographic inferences.
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作