A comprehensive test dataset for MacSyFinder v2 with TXSScan
收藏Figshare2022-12-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_comprehensive_test_dataset_for_MacSyFinder_v2_with_TXSScan/21716426
下载链接
链接失效反馈官方服务:
资源简介:
We present here a sample of two genomic datasets, to test MacSyFinder on it: 1. The complete genome sequence of Acinetobacter baumannii ATCC 17978 "CP000521_proteins.fasta". 2. A set of chromosomes from 3 genomes "three_genomes_proteins.fasta" presented in a single multi-FASTA protein file, following the Gembase naming convention described in MacSyFinder's documentation from: Pseudomonas aeruginosa PAO1 ("PSAE001.B.00001.C001"), Burkholderia mallei ATCC 23344 chromosome 1 ("BUMA001.B.00001.C001"), Vibrio cholerae O1 biovar El Tor str. N16961 chromosome I ("VICH001.B.00001.C001"). The files offered for download consist in: The genome(s) to annotate presented as multi-FASTA files of the proteins ordered as the genes encoding them. The output folders resulting from an annotation of the protein secretion systems and appendages on the genomes, using the macsyfinder set of models ("macsy-model") TXSScan version 1.1.1, and different options. The following command lines were used to obtain the output files: First, the genomes are downloaded From the present page. Second, the MacSyFinder program is installed. Have a look here for procedure: https://github.com/gem-pasteur/macsyfinder Third, the TXSScan models for annotation of secretion systems are installed. The command line is the following: > macsydata install TXSScan # Install the latest version of TXSScan Finally, MacSyFinder is run on the genome datasets, here using 8 workers to speed up the HMM search ("-w 8" option): On the A. baumannii genome, using the "ordered" mode to search all systems in TXSScan: > macsyfinder --sequence-db CP000521_proteins.fasta -o macsyfinder_TXSScan_CP000521_ordered --models TXSScan all --db-type ordered_replicon -w 8 # specified output folder: macsyfinder_TXSScan_CP000521_ordered On the A. baumannii genome, using the "unordered" mode to search all systems in TXSScan: > macsyfinder --sequence-db CP000521_proteins.fasta -o macsyfinder_TXSScan_CP000521_unordered --models TXSScan all --db-type unordered -w 8 # specified output folder: macsyfinder_TXSScan_CP000521_unordered On the A. baumannii genome, using the "ordered" mode to search all the systems of diderm bacteria in TXSScan: > macsyfinder --sequence-db CP000521_proteins.fasta --db-type ordered_replicon -o macsyfinder_TXSScan-diderm_CP000521_ordered --models TXSScan/bacteria/diderm all -w 8 # specified output folder: macsyfinder_TXSScan-diderm_CP000521_ordered On the A. baumannii genome, using the "ordered" mode to search only the T1SS system in TXSScan: > macsyfinder --sequence-db CP000521_proteins.fasta -o macsyfinder_TXSScan-T1SS_CP000521_ordered --models TXSScan bacteria/diderm/T1SS --db-type ordered_replicon -w 8 # specified output folder: macsyfinder_TXSScan-T1SS_CP000521_ordered On the Gembase dataset containing three ordered genomes, to search all systems in TXSScan: >macsyfinder --sequence-db three_genomes_proteins.fasta --db-type gembase -o macsyfinder_TXSScan_ThreeGenomes --models TXSScan all -w 8 # specified output folder: macsyfinder_TXSScan_ThreeGenomes The documentation on the generated output files Can be consulted here.
创建时间:
2022-12-13



