Pseudogene identification pipeline
收藏Figshare2020-09-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Pseudogene_identification_pipeline/12545984
下载链接
链接失效反馈官方服务:
资源简介:
This is a manually modified pipeline based on the program suite Psi-Phi (Lerat and Ochman, 2004), which identifies pseudogenes by doing tblastn within a cluster of closely related genomes.Input data1. Amino acid files of all genomes named *.faa2. Genome fasta files named *.fasta (draft genome) or *.fas (complete genome)3. Genbank files of all genomes named *.gbkOriginal Psi-Phi scripts are designed for complete genome. For genome assemblies that have not been completely assembled into a single chromosome, all contigs of a genome should be concatenated to an artificially assembled genome first (*.fasta->*.fas) as the input of Psi-Phi.Instructions for pseudogene identification (Psi-Phi) 1. Makeblastdb for tblastnIn this step, command line will be written as below to make nucleotide database of all artificially assembled genome.makeblastdb -in input.fas -dbtype nucl -parse_seqids -out input.fas2. Run tblastn jobsIn this step, command line will be written as below to submit blast jobs for all pairs of the genome sequences. tblastn -query query.faa -db target.fas -outfmt 6 -evalue 1e-15 -out query.vs.target.blast3. Batch run module1 of Psi-Phi using batch.module1.plThis step retrieves particular data from a blast file, output results will be found in 'query.vs.target.sortie.out' files.4. Batch run module2 of Psi-Phi using batch.module2.plThis step determines the list and category of each pseudogene candidate using the output of the first module. Potential pseudogenes will be found in 'pseudo_target_use_query' files.Instructions for pseudogene filteringThe filtering step is further applied to filter out misidentified pseudogenes by Psi-Phi.CitationIf you are interested in using this modified pipeline for pseudogene identification with filtering, please cite:Lerat E, Ochman H. Ψ-Φ: Exploring the outer limits of bacterial pseudogenes. Genome Res 2004; 14: 2273–2278.Chu X, Li S, Wang S, Luo D, Luo H. Gene Loss through Pseudogenization Contributes to the Ecological Diversification of a Generalist Roseobacter Lineage. (unpublished)
创建时间:
2020-09-13



