Data for evaluation of diverse-seq algorithms
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14052786
下载链接
链接失效反馈官方服务:
资源简介:
The algorithms required for phylogenetics — multiple sequence alignment and phylogeny estimation — are both compute intensive. `diverse-seq` implements computationally efficient alignment-free algorithms that enable efficient prototyping for phylogenetic workflows. It can accelerate parameter selection searches for sequence alignment and phylogeny estimation by identifying a subset of sequences that are representative of the diversity in a collection. `diverse-seq` can further boost the performance of phylogenetic estimation by providing a seed phylogeny that can be further refined by a more sophisticated algorithm.
The data sets in this archive are either HDF5 stored whole microbial genomes or multiple sequence alignments of one-to-one orthologs from mammal species in fasta format. The `wol.dvseqs` HDF5 file is derived from the data used in Zhu et al Nature Communications, 10(1), 5477 with the original fasta formatted files in wol.zip. The `soil.dvseqs` HDF5 file is derived from the genomes included in REFSOIL (Choi et al The ISME Journal, 11(4), 829–834), with the original GenBank formatted files included in refsoil.zip. The data in `mammal_orths_31_aligned.zip` are fasta formatted multiple sequence alignments of sequences sampled from Ensembl release 113.
创建时间:
2025-03-27



