The limits of the constant-rate birth-death prior for phylogenetic tree topology inference
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.2fqz612vg
下载链接
链接失效反馈官方服务:
资源简介:
Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under constant-rate birth-death (crBD) tend to differ from empirical trees, for example with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between crBD and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which crBD differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used crBD priors with those that used other non-BD Bayesian and non-Bayesian methods, we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using crBD in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under crBD, leading to systematic bias in data sets with limited information content.
Methods
Empirical trees used in the study are trees from the literature, collected by TimeTree (timetree.org).
Run ```Tree_Selection.R``` to select the empirical phylogenetic trees to be included from TimeTree. The output file ```final_timetrees.RData``` contains the final subset of empirical phylogenetic TimeTree trees used for analysis with anonymized tip labels.
2. Run ```Simulation_And_Analysis.R``` to fit birth and death parameters (assuming rho = 1) for each of the 1189 empirical trees, simulate 1000 trees per empirical tree, calculate tree index values for both empirical and simulated trees, and calculate z-scores comparing the simulated and empirical trees. Note that calculating the tree index values for the simulated trees is VERY time-consuming due to the number of trees. Run ```Supplementary_Fig_S1_Analysis.R``` to generate data for Supplementary Figure S1.
3. Run ```Meta_analysis.R``` to run the linear regression models to investigate the role of the prior/analysis type for the subset (n=300) of the included empirical trees. The metadata for the 300 trees can be found in the supplementary files (Table S3).
4. Run ```Imbalance_Simulation.R``` to run the simulations for the imbalanced data subset (100 trees). Simulated sequences for each tree were run through RevBayes and IQ-TREE 2, as mentioned previously. Note: To avoid later confusion, the three various substitution rates used (0.5, 0.05, 0.005) are referred to as Rates 2-4 in the code. There is therefore no Rate 1; apologies in advance for any confusion. The shell scripts to run the inferences in each software are as follows: 4a. RevBayes: ```fasta_to_revbayes_code_rate2.sh```, ```fasta_to_revbayes_code_rate3.sh```, ```fasta_to_revbayes_code_rate4.sh``` These shell scripts use the following .Rev files: ```MCMC_Revbayes_code_rate2.Rev```, ```MCMC_Revbayes_code_rate3.Rev```, ```MCMC_Revbayes_code_rate4.Rev``` And rely on the following supplementary .Rev files: ```tree_BD.Rev```, ```sub_JC.Rev```, ```clock_global.Rev``` 4b. IQ-TREE 2: ```fasta_to_iqtree_code.sh```
5. Run ```Final_Figures.R``` to visualize the results.
创建时间:
2023-11-06



