five

Generalized bootstrap supports for phylogenetic analyses of protein sequences incorporating alignment uncertainty

收藏
DataONE2020-06-30 更新2025-06-21 收录
下载链接:
https://search.dataone.org/view/sha256:ea6b05c89f096df59a7be585aa6d0ab4821aedb58b6d700f50f866fc6c9f392b
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood trees estimated from these multiple sequence alignments are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and...

系统发育重建在基因组数据分析中至关重要,其性能依赖于精准的多序列比对(multiple sequence alignment, MSA)模型。本研究表明,当前所有可用的大规模渐进式多序列比对方法在处理氨基酸序列时均存在数值不稳定性:当改变序列输入顺序时,其输出结果会产生显著差异。我们借助HOMFAM蛋白质序列数据集开展验证,结果显示,在序列数量超过100的数据集上,该不稳定性平均会影响21.5%的比对残基。基于这些多序列比对结果构建的最大似然树同样存在不稳定性,超过38%的分支对序列输入顺序敏感。我们进一步明确,约三分之二的此类不确定性源于估算MSA时所用引导树内子节点的无序性。为量化该不确定性,我们开发了unistrap这一全新方法,可用于估算比对不确定性与……的综合影响
创建时间:
2025-06-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作