Data from: Phylogenomic subsampling and the search for phylogenetically reliable loci
收藏DataCite Commons2026-03-05 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.sj3tx9646
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenomic subsampling is a procedure by which small sets of loci are
selected from large genome-scale datasets and used for phylogenetic
inference. This step is often motivated by either computational
limitations associated with the use of complex inference methods, or as a
means of testing the robustness of phylogenetic results by discarding loci
that are deemed potentially misleading. Although many alternative methods
of phylogenomic subsampling have been proposed, little effort has gone
into comparing their behavior across different datasets. Here, I calculate
multiple gene properties for a range of phylogenomic datasets spanning
animal, fungal and plant clades, uncovering a remarkable predictability in
their patterns of covariance. I also show how these patterns provide a
means for ordering loci by both their rate of evolution and their relative
phylogenetic usefulness. This method of retrieving phylogenetically useful
loci is found to be among the top performing when compared to alternative
subsampling protocols. Relatively common approaches such as minimizing
potential sources of systematic bias or increasing the clock-likeness of
the data are found to fare worse than selecting loci at random. Likewise,
the general utility of rate-based subsampling is found to be limited: loci
evolving at both low and high rates are among the least effective, and
even those evolving at optimal rates can still widely differ in
usefulness. This study shows that many common subsampling approaches
introduce unintended effects in off-target gene properties, and proposes
an alternative multivariate method that simultaneously optimizes
phylogenetic signal while controlling for known sources of bias.
提供机构:
Dryad
创建时间:
2021-06-09



