Data from: Batch effects in a multi-year sequencing study: false biological trends due to changes in read lengths
收藏DataCite Commons2025-04-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.8vm8d
下载链接
链接失效反馈官方服务:
资源简介:
High-throughput sequencing is a powerful tool, but suffers biases and
errors that must be accounted for to prevent false biological conclusions.
Such errors include batch effects, technical errors only present in
subsets of data due to procedural changes within a study. If overlooked
and multiple batches of data are combined, spurious biological signals can
arise, particularly if batches of data are correlated with biological
variables. Batch effects can be minimized through randomisation of sample
groups across batches. However, in long-term or multi-year studies where
data are added incrementally, full randomisation is impossible and batch
effects may be a common feature. Here we present a case study where false
signals of selection were detected due to a batch effect in a multi-year
study of Alpine ibex (Capra ibex). The batch effect arose because
sequencing read length changed over the course of the project and
populations were added incrementally to the study, resulting in non-random
distributions of populations across read lengths. The differences in read
length caused small misalignments in a subset of the data, leading to
false variant alleles and thus false SNPs. Pronounced allele frequency
differences between populations arose at these SNPs because of the
correlation between read length and population. This created highly
statistically significant, but biologically spurious, signals of selection
and false associations between allele frequencies and the environment. We
highlight the risk of batch effects and discuss strategies to reduce the
impacts of batch effects in multi-year high-throughput sequencing studies.
提供机构:
Dryad
创建时间:
2018-03-02



