Data for evaluation of diverse-seq algorithms

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14052786

下载链接

链接失效反馈

官方服务：

资源简介：

The algorithms required for phylogenetics — multiple sequence alignment and phylogeny estimation — are both compute intensive. `diverse-seq` implements computationally efficient alignment-free algorithms that enable efficient prototyping for phylogenetic workflows. It can accelerate parameter selection searches for sequence alignment and phylogeny estimation by identifying a subset of sequences that are representative of the diversity in a collection. `diverse-seq` can further boost the performance of phylogenetic estimation by providing a seed phylogeny that can be further refined by a more sophisticated algorithm. The data sets in this archive are either HDF5 stored whole microbial genomes or multiple sequence alignments of one-to-one orthologs from mammal species in fasta format. The `wol.dvseqs` HDF5 file is derived from the data used in Zhu et al Nature Communications, 10(1), 5477 with the original fasta formatted files in wol.zip. The `soil.dvseqs` HDF5 file is derived from the genomes included in REFSOIL (Choi et al The ISME Journal, 11(4), 829–834), with the original GenBank formatted files included in refsoil.zip. The data in `mammal_orths_31_aligned.zip` are fasta formatted multiple sequence alignments of sequences sampled from Ensembl release 113.

创建时间：

2025-03-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集