Datasets of the manuscript "Rational design of profile HMMs for sensitive and specific sequence detection with case studies applied to viruses, bacteriophages, and casposons"
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7490324
下载链接
链接失效反馈官方服务:
资源简介:
DATASETS
Rational design of profile HMMs for sensitive and specific sequence detection with case studies applied to viruses, bacteriophages, and casposons
Liliane S. Oliveira, Alejandro Reyes, Bas E. Dutilh and Arthur Gruber*
* Correspondence: argruber@usp.br (AG); Tel. +55 11 3091 7274
Here we provide different data of Microviridae, Flavivirus and casposons used throughout the work:
Microviridae folder
conserved_HMMs – profile HMMs constructed with TABAJARA in Conservation mode for Microviridae
discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for Microviridae
sequences – different sequence datasets and respective multiple sequence alignments
Microviridae_113-seq_training_set.fasta - 113 VP1 sequences covering diversity of the Microviridae family
Microviridae_113-seq.aln – multiple sequence alignment of the 113-protein dataset
Microviridae_1836-seq_testset.fasta - 1,836 sequence dataset covering 1,836 sequences of the major capsid protein (VP1) comprising 501 Alpavirinae sequences, 1,040 Gokushovirinae sequences and 295 Pichovirinae sequences
Microviridae_1866-seq.aln - multiple sequence alignment of the 1,866-protein Microviridae dataset used in the experiment of Figure 4
Flavivirus folder
conserved_HMMs – profile HMMs constructed with TABAJARA in Conservation mode for Flavivirus
discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for Flavivirus
full-length – models constructed from full-length protein sequences
short - models constructed from selected short alignment blocks of the protein sequences
sequences – different sequence datasets and respective multiple sequence alignments
Flavivirus_127-seq_training_set.fasta - 127 polyprotein sequences covering species diversity of the genus Flavivirus
Flavivirus_127-seq.aln – multiple sequence alignment of the 127-protein dataset
Flavivirus_6364-seq_testset.fasta - 6,364 sequence dataset covering species diversity of Flavivirus, including 3,919 of dengue virus (DENV), 327 of Zika virus (ZIKV), 63 of yellow fever virus (YFV), and the remaining 2,055 sequences covering other available flaviviruses
Flavivirus_6364-seq.aln - multiple sequence alignment of the 6,364-protein Flavivirus dataset
Casposons folder
casposon_generic_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for the generic detection of all casposons and discrimination from CRISPRs.
casposon_family_discriminative_HMMs – profile HMMs constructed with TABAJARA in Discrimination mode for the specific discrimination among casposon families and from CRISPRs.
sequences – different sequence datasets and respective multiple sequence alignments
casposons_crisprs.fasta – 106 Cas1 bona fide sequences derived from 52 CRISPRs and 54 casposons
casposon_family_discrimination.aln - multiple sequence alignment of 52 bona fide CRISPR and 54 casposon sequences, with appropriate nomenclature to run TABAJARA for the discrimination of each casposon family.
casposons_crisprs_discrimination.aln - multiple sequence alignment of 52 bona fide CRISPR and 54 casposon sequences, with appropriate nomenclature to run TABAJARA for discrimination of CRISPRs and casposons.
创建时间:
2022-12-29



