Data from: Accurate inference of tree topologies from multiple sequence alignments using deep learning
收藏DataCite Commons2025-06-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.ct2895s
下载链接
链接失效反馈官方服务:
资源简介:
Reconstructing the phylogenetic relationships between species is one of
the most formidable tasks in evolutionary biology. Multiple methods exist
to reconstruct phylogenetic trees, each with their own strengths and
weaknesses. Both simulation and empirical studies have identified several
“zones” of parameter space where accuracy of some methods can plummet,
even for four-taxon trees. Further, some methods can have undesirable
statistical properties such as statistical inconsistency and/or the
tendency to be positively misleading (i.e. assert strong support for the
incorrect tree topology). Recently, deep learning techniques have made
inroads on a number of both new and longstanding problems in biological
research. Here we designed a deep convolutional neural network (CNN) to
infer quartet topologies from multiple sequence alignments. This CNN can
readily be trained to make inferences using both gapped and ungapped data.
We show that our approach is highly accurate on simulated data, often
outperforming traditional methods, and is remarkably robust to
bias-inducing regions of parameter space such as the Felsenstein zone and
the Farris zone. We also demonstrate that the confidence scores produced
by our CNN can more accurately assess support for the chosen topology than
bootstrap and posterior probability scores from traditional methods. While
numerous practical challenges remain, these findings suggest that deep
learning approaches such as ours have the potential to produce more
accurate phylogenetic inferences.
提供机构:
Dryad
创建时间:
2019-09-06



