Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments
收藏DataONE2023-06-29 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:1959c925ea857354b8f9e8ef6669592557f0ecd7186866138a2035ec067ea2f8
下载链接
链接失效反馈官方服务:
资源简介:
Machine learning can be as good as maximum likelihood when reconstructing phylogenetic topologies and determining the best evolutionary model on four taxon alignments.
Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the Maximum Likelihood tree reconstruction method. Here we show that for quartet trees, Machine Learning using neural networks can be as good as the Maximum Likelihood method to infer the best tree topology and the best model of sequence evolution for nucleotide as well as amino acid sequences. For this purpose we simulated data sets for a wide range of branch lengths, evolutionary models and model parameters and compared the topologies and inferred models obtained with Machine learning with those obtained with the Maximum Likelihood and the Neighbour Joining method. Our results show that neural networks are a promising avenue for determining relatedness between taxa, which is ..., This archive is part of the DeepNNPhylogeny project: DeepNNPhylogeny, for which the code of the software is available on GitHub. It contains pre-trained neural networks to predict (a) the best models of sequence evolution and (b) the best quartet tree topologies for alignments of four nucleotide or amino acid sequences. For each use case, six neural networks with different architectures have been trained and saved for further usage with the Python library TensorFlow. Neural networks have been saved with the tf.keras.Model.save function in the so-called Tensorflow SavedModel format. All neural networks have been trained with a large number of alignments simulated with the software PolyMoSim v1.1.4, which is available on GitHub. For each simulated data set, model parameters (including proportion of invariant sites, shape parameter of gamma distribution for site heterogeneity, transition/transversion ratio - if applicable, nucleotide base frequencies - if applicable, relative substitution ..., In this project, neural networks have been trained to:
- predict/classify the correct topology for four nucleotide or amino acid sequences that evolved on a quartet tree.
- predict the best model of sequence evolution for four nucleotide or amino acid sequences that evolved on a quartet tree.
Together with the software in the DeepNNPhylogeny project, the pre-trained neural networks can be used to predict the best model of sequence evolution for the model and topology classification tasks.
The GitHub repository DeepNNPhylogeny contains the software with which:
a) the neural networks presented here have been trained and with which new neural networks can be trained,
b) predictions can be made using the pre-trained neural networks available in this archive. They can predict with an accuracy close or identical to the Maximum likelihood method the best evolutionary model and best topology for alignments of four nucleotide or amino acid sequences.
The neural networks stored in this repository...
创建时间:
2023-11-29



