Accelerating maximum likelihood phylogenetic inference via early stopping to evade (over-)optimization
收藏DataONE2025-06-30 更新2025-07-19 收录
下载链接:
https://search.dataone.org/view/sha256:052155ab8561fc67327eff7bb3cad5f883724483bd8a7752a04993d63d81d3e8
下载链接
链接失效反馈官方服务:
资源简介:
Maximum Likelihood (ML) based phylogenetic inference constitutes a challenging optimization problem. Given a set of aligned input sequences, phylogenetic inference tools strive to determine the tree topology, the branch-lengths, and the evolutionary model parameters that maximize the phylogenetic likelihood function. However, there exist compelling reasons to not push optimization to its limits, by means of early, yet adequate stopping criteria. Since input sequences are typically subject to stochastic and systematic noise, caution is warranted to prevent over-optimization and the risk of overfitting the model to noisy data. To address this, we integrate the Kishino-Hasegawa (KH) test into RAxML-NG as a reliable and fast-to-compute Early Stopping criterion to effectively limit excessive and compute-intensive over-optimization. Initially, we introduce a simplified heuristic tree search strategy in RAxML-NG (sRAxML-NG) as an underlying method for Early Stopping. Subseq..., , , # Data from: Accelerating maximum likelihood phylogenetic inference via early stopping to evade (over-)optimization
This repository contains the datasets used in our manuscript:
* Anastasis Togkousidis, Alexandros Stamatakis, Olivier Gascuel, Accelerating Maximum Likelihood Phylogenetic Inference via Early Stopping to Evade (Over-)optimization, *Systematic Biology*, 2025;, syaf043, [https://doi.org/10.1093/sysbio/syaf043](https://doi.org/10.1093/sysbio/syaf043)
The study compares Early Stopping (ES) methods in Maximum Likelihood (ML) phylogenetic tree inference against standard RAxML-NG v1.2. The ES versions are implemented as separate versions within RAxML-NG:
* Simplified RAxML-NG (sRAxML-NG)
* KH version, i.e., RAxML-NG using the KH test (Kishino & Hasegawa, 1989)
* KH-multiple testing version (KH version with multiple testing correction)
The repository includes:
* 300 large empirical MSAs
* 1,076 simulated MSAs
Datasets are organized into three main subfolders: `empirical-lon...,
创建时间:
2025-07-01



