A comprehensive test dataset for MacSyFinder v2 with TXSScan
收藏DataCite Commons2022-12-13 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/A_comprehensive_test_dataset_for_MacSyFinder_v2_with_TXSScan/21716426
下载链接
链接失效反馈官方服务:
资源简介:
We present here a sample of two genomic datasets, to test MacSyFinder on it: <br> 1. The complete genome sequence of <em>Acinetobacter baumannii </em>ATCC 17978 "CP000521_proteins.fasta". <br> 2. A set of chromosomes from 3 genomes "three_genomes_proteins.fasta" presented in a single multi-FASTA protein file, following the Gembase naming convention described in MacSyFinder's documentation from: <em>Pseudomonas aeruginosa</em> PAO1 ("PSAE001.B.00001.C001"), <em>Burkholderia mallei ATCC 23344 chromosome 1 ("BUMA001.B.00001.C001"),</em> <em>Vibrio cholerae </em>O1 biovar El Tor str. N16961 chromosome I ("VICH001.B.00001.C001"). <br> The files offered for download consist in: The genome(s) to annotate presented as multi-FASTA files of the proteins ordered as the genes encoding them. The output folders resulting from an annotation of the protein secretion systems and appendages on the genomes, using the macsyfinder set of models ("macsy-model") TXSScan version 1.1.1, and different options. <br> The following command lines were used to obtain the output files:<br> First, the genomes are downloaded From the present page. Second, the MacSyFinder program is installed. Have a look here for procedure: https://github.com/gem-pasteur/macsyfinder Third, the TXSScan models for annotation of secretion systems are installed. The command line is the following:<br> <em>> macsydata install TXSScan</em> <em># Install the latest version of TXSScan<br> </em> Finally, MacSyFinder is run on the genome datasets, here using 8 workers to speed up the HMM search ("-w 8" option): On the <em>A. baumannii</em> genome, using the "ordered" mode to search all systems in TXSScan: <em>> macsyfinder --sequence-db CP000521_proteins.fasta -o macsyfinder_TXSScan_CP000521_ordered --models TXSScan all --db-type ordered_replicon -w 8 </em> <em># specified output folder: macsyfinder_TXSScan_CP000521_ordered</em><br> On the <em>A. baumannii</em> genome, using the "unordered" mode to search all systems in TXSScan: <em>> macsyfinder --sequence-db CP000521_proteins.fasta -o macsyfinder_TXSScan_CP000521_unordered --models TXSScan all --db-type unordered -w 8 </em> # specified output folder: macsyfinder_TXSScan_<em>CP000521</em>_unordered <br> On the <em>A. baumannii</em> genome, using the "ordered" mode to search all the systems of diderm bacteria in TXSScan: <em>> macsyfinder --sequence-db CP000521_proteins.fasta --db-type ordered_replicon -o macsyfinder_TXSScan-diderm_CP000521_ordered --models TXSScan/bacteria/diderm all -w 8</em> <em># specified output folder: macsyfinder_TXSScan-diderm_CP000521_ordered</em> On the <em>A. baumannii</em> genome, using the "ordered" mode to search only the T1SS system in TXSScan: <em>> macsyfinder --sequence-db CP000521_proteins.fasta -o macsyfinder_TXSScan-T1SS_CP000521_ordered --models TXSScan bacteria/diderm/T1SS --db-type ordered_replicon -w 8 </em> <em># specified output folder: macsyfinder_TXSScan-T1SS_CP000521_ordered</em> <br> On the Gembase dataset containing three ordered genomes, to search all systems in TXSScan: <em>>macsyfinder --sequence-db three_genomes_proteins.fasta --db-type gembase -o macsyfinder_TXSScan_ThreeGenomes --models TXSScan all -w 8</em> # specified output folder: <em>macsyfinder_TXSScan_ThreeGenomes</em> <br> The documentation on the generated output files Can be consulted here. <br>
提供机构:
figshare
创建时间:
2022-12-13



