five

Pseudogene identification pipeline

收藏
DataCite Commons2020-09-13 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Pseudogene_identification_pipeline/12545984
下载链接
链接失效反馈
官方服务:
资源简介:
This is a manually modified pipeline based on the program suite Psi-Phi (Lerat and Ochman, 2004), which identifies pseudogenes by doing tblastn within a cluster of closely related genomes.<br><b>Input data</b><br>1. Amino acid files of all genomes named *.faa2. Genome fasta files named *.fasta (draft genome) or *.fas (complete genome)3. Genbank files of all genomes named *.gbk<br>Original Psi-Phi scripts are designed for complete genome. For genome assemblies that have not been completely assembled into a single chromosome, all contigs of a genome should be concatenated to an artificially assembled genome first (*.fasta-&gt;*.fas) as the input of Psi-Phi.<br><b>Instructions for pseudogene identification (Psi-Phi)<br></b> 1. Makeblastdb for tblastn<br>In this step, command line will be written as below to make nucleotide database of all artificially assembled genome.<br>makeblastdb -in input.fas -dbtype nucl -parse_seqids -out input.fas<br>2. Run tblastn jobs<br>In this step, command line will be written as below to submit blast jobs for all pairs of the genome sequences. <br>tblastn -query query.faa -db target.fas -outfmt 6 -evalue 1e-15 -out query.vs.target.blast<br>3. Batch run module1 of Psi-Phi using batch.module1.pl<br>This step retrieves particular data from a blast file, output results will be found in 'query.vs.target.sortie.out' files.<br>4. Batch run module2 of Psi-Phi using batch.module2.pl<br>This step determines the list and category of each pseudogene candidate using the output of the first module. Potential pseudogenes will be found in 'pseudo_target_use_query' files.<br><b>Instructions for pseudogene filtering</b><br>The filtering step is further applied to filter out misidentified pseudogenes by Psi-Phi.<br><b>Citation</b><br>If you are interested in using this modified pipeline for pseudogene identification with filtering, please cite:<br>Lerat E, Ochman H. Ψ-Φ: Exploring the outer limits of bacterial pseudogenes. <i>Genome Res</i> 2004; <b>14</b>: 2273–2278.<br><br>Chu X, Li S, Wang S, Luo D, Luo H. Gene Loss through Pseudogenization Contributes to the Ecological Diversification of a Generalist <i>Roseobacter</i> Lineage. (unpublished)
提供机构:
figshare
创建时间:
2020-09-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作