five

Data for evaluation of diverse-seq algorithms

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14052786
下载链接
链接失效反馈
官方服务:
资源简介:
The algorithms required for phylogenetics — multiple sequence alignment and phylogeny estimation — are both compute intensive. `diverse-seq` implements computationally efficient alignment-free algorithms that enable efficient prototyping for phylogenetic workflows. It can accelerate parameter selection searches for sequence alignment and phylogeny estimation by identifying a subset of sequences that are representative of the diversity in a collection. `diverse-seq` can further boost the performance of phylogenetic estimation by providing a seed phylogeny that can be further refined by a more sophisticated algorithm. The data sets in this archive are either HDF5 stored whole microbial genomes or multiple sequence alignments of one-to-one orthologs from mammal species in fasta format. The `wol.dvseqs` HDF5 file is derived from the data used in Zhu et al Nature Communications, 10(1), 5477 with the original fasta formatted files in wol.zip. The `soil.dvseqs` HDF5 file is derived from the genomes included in REFSOIL (Choi et al The ISME Journal, 11(4), 829–834), with the original GenBank formatted files included in refsoil.zip. The data in `mammal_orths_31_aligned.zip` are fasta formatted multiple sequence alignments of sequences sampled from Ensembl release 113.
创建时间:
2025-03-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作