Toward a semi-supervised learning approach to phylogenetic estimation
收藏DataONE2024-05-28 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/https://doi.org/10.5061/dryad.qz612jmn6
下载链接
链接失效反馈官方服务:
资源简介:
Models have always been central to inferring molecular evolution and to reconstructing phylogenetic trees. Their use typically involves the development of a mechanistic framework reflecting our understanding of the underlying biological processes, such as nucleotide substitutions, and the estimation of model parameters by maximum likelihood or Bayesian inference. However, deriving and optimizing the likelihood of the data is not always possible under complex evolutionary scenarios or even tractable for large datasets, often leading to unrealistic simplifying assumptions in the fitted models. To overcome this issue, we coupled stochastic simulations of genome evolution with a new supervised deep learning model to infer key parameters of molecular evolution. Our model is designed to directly analyze multiple sequence alignments and estimate per-site evolutionary rates and divergence, without requiring a known phylogenetic tree. The accuracy of our predictions matched that of likelihood-ba..., , , # Toward a semi-supervised learning approach to phylogenetic estimation
The latest phyloRNN code is available here: [https://github.com/phyloRNN/](https://github.com/phyloRNN/)
This repository hosts the version used in the manuscript and includes pre-trained models and the empirical data analyzed in the paper.
## Empirical data
The file *Chr1.WGAlign.FromBam.Filtered.fasta* contains the alignment of chromosome 1 across clownfish species in FASTA format.Â
## Supplementary Table
The full Table S7 discussed in the article is available in the *data_and_scripts/Supplementary_information* directory.
The table shows the results of 600 simulations (rows) under different heterogeneity models and tree length. For each simulation, Log-likelihood of the simulated data is computed given the true tree, given the nucleotide matrix estimated under gamma rates, and given site- specific rates. Site-specific rates are either posteriors under gamma model, posteriors under free-rates model, estimated...
创建时间:
2024-05-29



