Data from: A k-mer-based approach for phylogenetic classification of taxa in environmental genomic data
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.08kprr55z
下载链接
链接失效反馈官方服务:
资源简介:
In the age of genome sequencing, whole genome data is readily and
frequently generated, leading to a wealth of new information that can be
used to advance various fields of research. New approaches, such as
alignment-free phylogenetic methods that utilize k-mer-based distance
scoring, are becoming increasingly popular given their ability to rapidly
generate phylogenetic information from whole genome data. However, these
methods have not yet been tested using environmental data, which often
tends to be highly fragmented and incomplete. Here we compare the results
of one alignment-free approach (which utilizes the D2 statistic) to
traditional multi-gene maximum likelihood trees in three algal groups that
have high-quality genome data available. In addition, we simulate
lower-quality, fragmented genome data using these algae to test method
robustness to genome quality and completeness. Finally, we apply the
alignment-free approach to environmental metagenome assembled genome data
of unclassified Saccharibacteria and Trebouxiophyte algae, and single-cell
amplified data from uncultured marine stramenopiles to demonstrate its
utility with real datasets. We find that in all instances, the
alignment-free method produces phylogenies that are comparable, and often
more informative, than those created using the traditional multi-gene
approach. The k-mer-based method performs well even when there is
significant missing data, that includes marker genes traditionally used
for tree reconstruction. Our results demonstrate the value of
alignment-free approaches for classifying novel, often cryptic or rare,
species, that may not be culturable or are difficult to access using
single-cell methods but fill important gaps in the tree of life.
提供机构:
Dryad
创建时间:
2023-06-21



