Orthogroup functional annotations and effector candidates

Name: Orthogroup functional annotations and effector candidates
Creator: figshare
Published: 2021-11-01 08:57:40
License: 暂无描述

DataCite Commons2021-11-01 更新2024-08-17 收录

下载链接：

https://figshare.com/articles/dataset/Orthogroup_functional_annotations_and_effector_candidates/12966971

下载链接

链接失效反馈

官方服务：

资源简介：

Functional annotations of all predicted proteins in a Parastagonospora nodorum pangenome. Orthogroups were predicted using ProteinOrtho, each gene is annotated with the P. nodorum isolate that it is predicted in and the orthogroup that it belongs to. The column "is_representative" indicates whether the protein was selected as the representative member of the orthogroup. This prefers copies in reference isolates, then copies closest to the average sequence length within the orthogroup. Predicted whole protein functions were found by searching the Swiss-Prot database version 2020_02 (Bairoch & Apweiler, 2000) using MMSeqs2 version 11-e1a1c (--start-sens 3 -s 7.5 --sens-steps 3 -a) (Steinegger & Söding, 2017). Matches were considered reliable for functional annotation if they covered >= 70% of both sequences, with >=60% sequence identity, and an e-value < 1e-10. Functional domains were predicted using InterProScan (P. Jones et al., 2014; Mitchell et al., 2019). Additionally, GO-terms and predicted product names were predicted using the web-servers of PANNZER (Koskinen et al., 2015) and eggNOG-Mapper (Huerta-Cepas et al., 2017). GO-term predictions from InterProScan, PANNZER, and eggNOG-Mapper were combined and filtered to exclude terms in the GO do_not_annotate “anti-slim” set (available at: http://geneontology.org/docs/download-ontology, downloaded: 2020-05-15) to remove uninformative terms, forming the final GO-term set for the predicted proteomes. Effector-like sequences were predicted using the Predector pipeline (https://github.com/ccdmb/predector, version: 0.1.0-alpha) (Submitted), which incorporates several software analyses including SignalP versions 3.0, 4.1g, 5.0b (José Juan Almagro Armenteros et al., 2019; Dyrløv Bendtsen et al., 2004; Petersen et al., 2011), DeepSig (Savojardo et al., 2018), TargetP version 2.0 (Jose Juan Almagro Armenteros et al., 2019), DeepLoc version 1.0 (Almagro Armenteros et al., 2017), TMHMM version 2.0c (Krogh et al., 2001), Phobius version 1.01 (Käll et al., 2004), EffectorP versions 1 and 2 (Sperschneider et al., 2016; Sperschneider, Dodds, Gardiner, et al., 2018), ApoplastP version 1 (Sperschneider, Dodds, Singh, et al., 2018), LOCALIZER (Sperschneider et al., 2017), homology searches against dbCAN version 8 using HMMER version 3.3 (Mistry et al., 2013), and sequence matches against PHI-base version 4.9 (Urban et al., 2020) using MMSeqs2 version 11.e1a1c (Steinegger & Söding, 2017).Information from Predector, InterProScan, Pannzer, eggNOG-mapper, positive selection and orthogroup analyses were combined into a single table. Raw orthogroup CDS codon alignments, gene trees, and BUSTED results are also included for all non-singleton orthogroups.

诺氏拟茎点霉（Parastagonospora nodorum）泛基因组中所有预测蛋白质的功能注释。本数据集使用ProteinOrtho软件预测直系同源组（orthogroup），每个基因均注释了其所在的诺氏拟茎点霉分离株及所属的直系同源组。 "is_representative"列用于指示该蛋白质是否被选为直系同源组的代表成员：优先选取参考分离株中的拷贝，若无可选参考分离株拷贝，则选取直系同源组内与平均序列长度最接近的拷贝。通过MMSeqs2软件版本11-e1a1c（参数：--start-sens 3 -s 7.5 --sens-steps 3 -a）（Steinegger & Söding, 2017）检索2020_02版Swiss-Prot蛋白质数据库（Bairoch & Apweiler, 2000），以预测完整蛋白质的功能。若比对覆盖度≥70%、序列一致性≥60%且E值<1e-10，则认定该比对结果可用于可靠的功能注释。使用InterProScan软件（P. Jones等, 2014; Mitchell等, 2019）预测蛋白质功能结构域。此外，通过PANNZER在线服务器（Koskinen等, 2015）与eggNOG-Mapper工具（Huerta-Cepas等, 2017）预测基因本体（GO）术语及蛋白质产物名称。将InterProScan、PANNZER与eggNOG-Mapper得到的GO术语预测结果进行合并，并过滤掉GO"anti-slim"非注释集合（下载自http://geneontology.org/docs/download-ontology，下载时间：2020-05-15）中的无信息术语，最终得到预测蛋白质组的GO术语集合。使用Predector分析流程（https://github.com/ccdmb/predector，版本：0.1.0-alpha，未正式发表）预测效应蛋白样序列，该流程整合了多款软件的分析功能，包括：SignalP 3.0、4.1g、5.0b版（José Juan Almagro Armenteros等, 2019; Dyrløv Bendtsen等, 2004; Petersen等, 2011）、DeepSig（Savojardo等, 2018）、TargetP 2.0版（Jose Juan Almagro Armenteros等, 2019）、DeepLoc 1.0版（Almagro Armenteros等, 2017）、TMHMM 2.0c版（Krogh等, 2001）、Phobius 1.01版（Käll等, 2004）、EffectorP 1.0与2.0版（Sperschneider等, 2016; Sperschneider, Dodds, Gardiner等, 2018）、ApoplastP 1.0版（Sperschneider, Dodds, Singh等, 2018）、LOCALIZER（Sperschneider等, 2017）；同时通过HMMER 3.3版（Mistry等, 2013）对dbCAN 8版碳水化合物活性酶数据库进行同源性检索，以及使用MMSeqs2 11.e1a1c版（Steinegger & Söding, 2017）对PHI-base 4.9版病原菌-宿主互作数据库（Urban等, 2020）进行序列比对。将Predector、InterProScan、PANNZER、eggNOG-Mapper、正选择分析及直系同源组分析得到的信息整合为单一表格。所有非单成员直系同源组的原始编码序列（CDS）密码子比对文件、基因树文件及BUSTED选择压力分析结果也已包含在本数据集中。

提供机构：

figshare

创建时间：

2020-11-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集