Pseudogene identification pipeline
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Pseudogene_identification_pipeline/12545984
下载链接
链接失效反馈官方服务:
资源简介:
This is a manually modified pipeline based on the program suite Psi-Phi (Lerat and Ochman, 2004), which identifies pseudogenes by doing tblastn within a cluster of closely related genomes.
Input data
1. Amino acid files of all genomes named *.faa2. Genome fasta files named *.fasta (draft genome) or *.fas (complete genome)3. Genbank files of all genomes named *.gbk
Original Psi-Phi scripts are designed for complete genome. For genome assemblies that have not been completely assembled into a single chromosome, all contigs of a genome should be concatenated to an artificially assembled genome first (*.fasta->*.fas) as the input of Psi-Phi.
Instructions for pseudogene identification (Psi-Phi)
1. Makeblastdb for tblastn
In this step, command line will be written as below to make nucleotide database of all artificially assembled genome.
makeblastdb -in input.fas -dbtype nucl -parse_seqids -out input.fas
2. Run tblastn jobs
In this step, command line will be written as below to submit blast jobs for all pairs of the genome sequences.
tblastn -query query.faa -db target.fas -outfmt 6 -evalue 1e-15 -out query.vs.target.blast
3. Batch run module1 of Psi-Phi using batch.module1.pl
This step retrieves particular data from a blast file, output results will be found in 'query.vs.target.sortie.out' files.
4. Batch run module2 of Psi-Phi using batch.module2.pl
This step determines the list and category of each pseudogene candidate using the output of the first module. Potential pseudogenes will be found in 'pseudo_target_use_query' files.
Instructions for pseudogene filtering
The filtering step is further applied to filter out misidentified pseudogenes by Psi-Phi.
Citation
If you are interested in using this modified pipeline for pseudogene identification with filtering, please cite:
Lerat
E, Ochman H. Ψ-Φ: Exploring the outer limits of bacterial pseudogenes. Genome
Res 2004; 14: 2273–2278.
Chu X, Li S, Wang S, Luo D, Luo H. Gene Loss through Pseudogenization Contributes to the Ecological Diversification of a Generalist Roseobacter Lineage. (unpublished)
创建时间:
2020-09-13



