Data and scripts for: Read length dominates phylogenetic placement accuracy of ancient DNA reads
收藏DataCite Commons2025-05-09 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.4f4qrfjn3
下载链接
链接失效反馈官方服务:
资源简介:
One of the central problems facing researchers who analyze ancient DNA
(aDNA) is identifying the species which corresponds to the recovered aDNA.
Prior analysis of aDNA data normally uses sequence matching tools (such as
BLAST) to identify reads obtained from aDNA. However, as the source of
aDNA is often an previously unsampled taxon due to the taxon having gone
extinct prior to the advent of modern sequencing technology, it is likely
the case that there is no exact match in any database. As a consequence
tools such as BLAST are of limited use in helping to place a read in a
phylogenetic context, I.E. identifying the likely source of a read on a
phylogenetic tree. Phylogenetic placement is a technique where a sequence
or read is placed onto a specific branch phylogenetic tree. These tools
offer a the potential for a much finer resolution when identifying reads.
However, phylogenetic placement has primarily only been used to place
reads obtained from extant sources. Phylogenetic placement's
applicability to aDNA data is complicated by the characteristic pattern of
degradation that aDNA undergoes. This characteristic damage is generally
not accounted for by popular phylogenetic placement tools, and as a
consequence some authors have cast doubt on the potential accuracy of such
tools. To understand how the characteristic aDNA damage affects placement
phylogenetic tools, implemented a statistical model of aDNA damage as a
tool, which we call PyGargammel, that takes sequences applies damage
characteristic of aDNA to them. We deploy PyGargammel, along with the
existing phylogenetic placement assessment pipeline PEWO, to 7 empirical
datasets. With this pipeline, we explore the parameter space of aDNA
damage via a grid search in order to identify the factors of aDNA damage
which are most impactful. We test 4 leading phlyogenetic placement tools:
APPLES, \epang{}, \pplacer{}, and RAPPAS. We find that the frequency of
DNA backbone nicks (and consequently read length) is the primary driver of
error for aDNA reads. Additionally, we find that other factors, such as
the rate of A to G misincorporations, have a negligible effect on the
overall accuracy of phylogenetic placement tools.
提供机构:
Dryad
创建时间:
2025-05-09



