Supporting data for " Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences"

Name: Supporting data for " Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences"
Creator: GigaScience Database
Published: 2025-05-26 17:20:35
License: 暂无描述

DataCite Commons2025-05-26 更新2025-04-15 收录

下载链接：

http://gigadb.org/dataset/100527

下载链接

链接失效反馈

官方服务：

资源简介：

Word-based or `alignment-free' sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances between genomic sequences. One of these approaches is Filtered Spaced Word Matches. Herein, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. We compare the performance of Prot-SpaM to other alignment-free methods on simulated sequences and on various groups of eukaryotic and prokaryotic taxa. Prot-SpaM can be used to calculate high-quality phylogenetic trees for dozens of whole-proteome sequences in a matter of seconds or minutes and often outperforms other alignment-free approaches. The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM

基于词的或“无比对”序列比较已成为生物信息学领域的活跃研究方向。以往的词频方法仅能计算序列相似性或差异性的粗略度量，而一些新的无比对方法可准确估计基因组序列间的系统发育距离。其中一种方法为Filtered Spaced Word Matches。本文将该方法扩展至完整或不完整蛋白质组间进化距离的估计；我们对该方法的实现称为Prot-SpaM。我们在模拟序列及各类真核生物与原核生物类群上，比较了Prot-SpaM与其他无比对方法的性能。Prot-SpaM可在数秒或数分钟内为数十个全蛋白质组序列计算高质量系统发育树，且其性能常优于其他无比对方法。该软件的源代码可通过Github获取：https://github.com/jschellh/ProtSpaM

提供机构：

GigaScience Database

创建时间：

2018-11-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集