Completing gene trees without species trees in sub-quadratic time
收藏DataONE2023-06-26 更新2025-08-09 收录
下载链接:
https://search.dataone.org/view/sha256:b35eed1ad4051985a372650837419c7acbe18feb241716714f7d27ce3b10149f
下载链接
链接失效反馈官方服务:
资源简介:
Motivation: As genome-wide reconstruction of phylogenetic trees becomes more widespread, limitations of available data are being appreciated more than ever before. One issue is that phylogenomic datasets are riddled with missing data, and gene trees, in particular, almost always lack representatives from some species otherwise available in the dataset. Since many downstream applications of gene trees require or can benefit from access to complete gene trees, it will be beneficial to algorithmically complete gene trees. Also, gene trees are often unrooted, and rooting them is useful for downstream applications. While completing and rooting a gene tree with respect to a given species tree has been studied, those problems are not studied in depth when we lack such a reference species tree.
Results: We study completion of gene trees without a need for a reference species tree. We formulate an optimization problem to complete the gene trees while minimizing their quartet distance to the give..., We test tripVote on published simulated datasets and a real plant dataset by OneKP Initiative. The simulated datasets were both created using Simphy to generate gene and species trees under the multi-species coalescent model and heterogeneous parameters, and Indelible to simulate nucleotide sequences on gene trees according to the GTRâ+âÎ model with varying sequence lengths and sequence evolution parameters. FastTree2 was used to estimate gene trees based on the GTRâ+âÎ model.
The 201-taxon dataset by Mirarab and Warnow (2015) includes incomplete lineage sorting (ILS). We use 3 model conditions with 200 taxa where tree length is 2M and speciation rate is 1e-6, and use the first 20 out of the 50 replicates of the original dataset (to save computational time). In each replicate, we use the first 500 (out of 1000) estimated gene trees that are fully resolved. The three model conditions have medium, high, and very high levels of ILS, resulting in 21, 33, and 69% RF distance between true gen...,
创建时间:
2025-07-22



