five

Phylo-k-mers databases for SHERPAS

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.r7sqv9s85
下载链接
链接失效反馈
官方服务:
资源简介:
SHERPAS is a new program to identify novel recombinant sequences in a large collection of viral sequences, and to provide a first estimate of their recombinant structure. SHERPAS is much faster than other softwares for recombination detection; its main feature is the use of a pre-computed database of "phylogenetically-informed k-mers" (or phylo-k-mers). The computation of this phylo-k-mer database is a heavy computational step, but it only needs to be executed once for a given reference alignment. A phylo-k-mer database can be built from any reference alignment, and a phylogenetic tree built from that alignment, using RAPPAS2 (https://github.com/phylo42/rappas2). We propose here three ready-to-use databases, for three reference alignments: -An alignment of 167 sequences of the pol region of the HIV genome, provided with the program SCUEAL, accessible at https://github.com/spond/SCUEAL/blob/master/data/pol2009.nex -An alignment of 339 sequence of the whole HBV genome, provided with the programm jpHMM, accessible at http://jphmm.gobics.de/download.html. -An alignment of 881 sequences of the whole HIV genome, also provided with jpHMM, accessible at http://jphmm.gobics.de/download.html. For each of these alignments, we provide a .zip file containing three files: The phylo-k-mer database (.rps file), the reference phylogenetic tree used to build the database (.tree file), and a table associating each reference sequence to a strain of the virus (.csv file). The details of the construction of the database, the construction of the tree, as well as the origin of the information reported in the table, can be found in the Supplementary Materials associated with the original Bioinformatics publication.
创建时间:
2021-07-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作