five

Data from: Weighted quartets phylogenetics

收藏
DataONE2015-02-06 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Despite the impressive advancements in technology as well as in theoretical tools, construction of phylogenetic (evolutionary) trees is still a challenging task. In particular, the presence of enormous quantities of molecular data available for taxonomic classication, has drawn the attention to large scale phylogenetic reconstruction. A key tool in this direction is the construction of separate trees over dierent, overlapping subsets of the species, and subsequently combine these trees into a single tree over the full set. A quartet tree - a tree over four species, is the most basic informational phylogenetic unit. The amalgamation of such quartets into a single tree lies at heart of many tasks in phylogenetics, yet remained a daunting task, in particular in light of conicting signals - a common reality in biology. Assigning these quartets with weights to indicate importance or reliability has been proposed more than a decade ago. Handling weighted quartets is even more challenging and was scarcely dealt in the past. In this work we focus on weighted quartet based approaches. We propose a scheme to assign weights to quartets coming from weighted trees and devise a tree similarity measure for weighted trees based on weighted quartets. We also extend the quartet MaxCut (QMC algorithm to handle weighted quartets using the tools above. We evaluated these tools by extensive experiments, both on simulated and real data and confronted the weighted QMC (wQMC) with the most prevalent tool - MRP. Our results show that wQMC is signicantly superior to MRP by all measures. Our real data results on cyanobacterial gene trees enforces previous results achieved by other tools.

尽管技术与理论工具均取得了长足进展,但系统发育(进化)树的构建仍是一项极具挑战性的任务。具体而言,当前可用于分类学研究的海量分子数据,促使学界将目光投向大规模系统发育重建任务。该方向的核心思路之一,是先基于物种的不同重叠子集分别构建子树,随后将这些子树整合为涵盖所有物种的全局树。四元树(quartet tree)——即仅包含四个物种的树——是最基础的信息型系统发育单元。将此类四元树整合为单棵全局树,是系统发育学诸多任务的核心环节,但这一过程仍极具难度,尤其是当存在冲突信号时——这在生物学研究中是极为常见的现象。早在十余年前,学界便提出了为四元树赋予权重以表征其重要性或可靠性的研究思路。而处理带权重的四元树则难度更高,过往相关研究鲜有涉及。本研究聚焦于基于带权重四元树的分析方法:我们提出了一种为源自带权重树的四元树赋予权重的方案,并设计了一种基于带权重四元树的带权重树相似度度量方法。此外,我们基于上述方法,将四元最大割(quartet MaxCut,QMC)算法拓展至可处理带权重四元树的场景。我们通过针对模拟数据与真实生物数据的大规模实验对上述方法进行了评估,并将带权重QMC(wQMC)与当前主流工具MRP进行了对比。实验结果表明,wQMC在各项评价指标上均显著优于MRP。我们针对蓝细菌基因树的真实数据实验结果,进一步验证了其他工具此前得到的研究结论。
创建时间:
2015-02-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作