five

Datasets of the manuscript "Rational design of profile HMMs for sensitive and specific sequence detection with case studies applied to viruses, bacteriophages, and casposons"

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7490324
下载链接
链接失效反馈
官方服务:
资源简介:
DATASETS Rational design of profile HMMs for sensitive and specific sequence detection with case studies applied to viruses, bacteriophages, and casposons Liliane S. Oliveira, Alejandro Reyes, Bas E. Dutilh and Arthur Gruber* * Correspondence: argruber@usp.br (AG); Tel. +55 11 3091 7274   Here we provide different data of Microviridae, Flavivirus and casposons used throughout the work: Microviridae folder conserved_HMMs – profile HMMs constructed with TABAJARA in Conservation mode for Microviridae discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for Microviridae sequences – different sequence datasets and respective multiple sequence alignments Microviridae_113-seq_training_set.fasta - 113 VP1 sequences covering diversity of the Microviridae family Microviridae_113-seq.aln – multiple sequence alignment of the 113-protein dataset Microviridae_1836-seq_testset.fasta - 1,836 sequence dataset covering 1,836 sequences of the major capsid protein (VP1) comprising 501 Alpavirinae sequences, 1,040 Gokushovirinae sequences and 295 Pichovirinae sequences Microviridae_1866-seq.aln - multiple sequence alignment of the 1,866-protein Microviridae dataset used in the experiment of Figure 4 Flavivirus folder conserved_HMMs – profile HMMs constructed with TABAJARA in Conservation mode for Flavivirus discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for Flavivirus full-length – models constructed from full-length protein sequences short - models constructed from selected short alignment blocks of the protein sequences sequences – different sequence datasets and respective multiple sequence alignments Flavivirus_127-seq_training_set.fasta - 127 polyprotein sequences covering species diversity of the genus Flavivirus Flavivirus_127-seq.aln – multiple sequence alignment of the 127-protein dataset Flavivirus_6364-seq_testset.fasta - 6,364 sequence dataset covering species diversity of Flavivirus, including 3,919 of dengue virus (DENV), 327 of Zika virus (ZIKV), 63 of yellow fever virus (YFV), and the remaining 2,055 sequences covering other available flaviviruses Flavivirus_6364-seq.aln - multiple sequence alignment of the 6,364-protein Flavivirus dataset Casposons folder casposon_generic_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for the generic detection of all casposons and discrimination from CRISPRs. casposon_family_discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for the specific discrimination among casposon families and from CRISPRs. sequences – different sequence datasets and respective multiple sequence alignments casposons_crisprs.fasta – 106 Cas1 bona fide sequences derived from 52 CRISPRs and 54 casposons casposon_family_discrimination.aln - multiple sequence alignment of 52 bona fide CRISPR and 54 casposon sequences, with appropriate nomenclature to run TABAJARA for the discrimination of each casposon family. casposons_crisprs_discrimination.aln - multiple sequence alignment of 52 bona fide CRISPR and 54 casposon sequences, with appropriate nomenclature to run TABAJARA for discrimination of CRISPRs and casposons.
创建时间:
2022-12-29
二维码
社区交流群
二维码
科研交流群
商业服务