Orthogroup functional annotations and effector candidates
收藏DataCite Commons2025-06-01 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Orthogroup_functional_annotations_and_effector_candidates/12966971/4
下载链接
链接失效反馈官方服务:
资源简介:
Functional annotations of all predicted proteins in a <i>Parastagonospora nodorum</i> pangenome.<br>Orthogroups were predicted using ProteinOrtho, each gene is annotated with the P. nodorum isolate that it is predicted in and the orthogroup that it belongs to.<br>The column "is_representative" indicates whether the protein was selected as the representative member of the orthogroup. This prefers copies in reference isolates, then copies closest to the average sequence length within the orthogroup.<br>Predicted whole protein functions were found by searching the Swiss-Prot database version 2020_02 (Bairoch & Apweiler, 2000) using MMSeqs2 version 11-e1a1c (--start-sens 3 -s 7.5 --sens-steps 3 -a) (Steinegger & Söding, 2017). Matches were considered reliable for functional annotation if they covered >= 70% of both sequences, with >=60% sequence identity, and an e-value < 1e-10. Functional domains were predicted using InterProScan (P. Jones et al., 2014; Mitchell et al., 2019). Additionally, GO-terms and predicted product names were predicted using the web-servers of PANNZER (Koskinen et al., 2015) and eggNOG-Mapper (Huerta-Cepas et al., 2017). GO-term predictions from InterProScan, PANNZER, and eggNOG-Mapper were combined and filtered to exclude terms in the GO do_not_annotate “anti-slim” set (available at: http://geneontology.org/docs/download-ontology, downloaded: 2020-05-15) to remove uninformative terms, forming the final GO-term set for the predicted proteomes.<br>Effector-like sequences were predicted using the Predector pipeline (https://github.com/ccdmb/predector, version: 0.1.0-alpha) (Submitted), which incorporates several software analyses including SignalP versions 3.0, 4.1g, 5.0b (José Juan Almagro Armenteros et al., 2019; Dyrløv Bendtsen et al., 2004; Petersen et al., 2011), DeepSig (Savojardo et al., 2018), TargetP version 2.0 (Jose Juan Almagro Armenteros et al., 2019), DeepLoc version 1.0 (Almagro Armenteros et al., 2017), TMHMM version 2.0c (Krogh et al., 2001), Phobius version 1.01 (Käll et al., 2004), EffectorP versions 1 and 2 (Sperschneider et al., 2016; Sperschneider, Dodds, Gardiner, et al., 2018), ApoplastP version 1 (Sperschneider, Dodds, Singh, et al., 2018), LOCALIZER (Sperschneider et al., 2017), homology searches against dbCAN version 8 using HMMER version 3.3 (Mistry et al., 2013), and sequence matches against PHI-base version 4.9 (Urban et al., 2020) using MMSeqs2 version 11.e1a1c (Steinegger & Söding, 2017).Information from Predector, InterProScan, Pannzer, eggNOG-mapper, positive selection and orthogroup analyses were combined into a single table.<br>Raw orthogroup CDS codon alignments, gene trees, and BUSTED results are also included for all non-singleton orthogroups.
本数据集包含诺尔氏副茎点霉(Parastagonospora nodorum)泛基因组中所有预测蛋白质的功能注释。
使用ProteinOrtho预测直系同源群(Orthogroups),每个基因均被注释了其所在的诺尔氏副茎点霉分离株信息,以及其所属的直系同源群。
“is_representative”列用于标识该蛋白质是否被选为直系同源群的代表成员。筛选规则优先选取参考分离株中的基因拷贝,其次选取直系同源群内与序列平均长度最接近的拷贝。
通过使用MMSeqs2版本11-e1a1c(参数:--start-sens 3 -s 7.5 --sens-steps 3 -a)(Steinegger & Söding, 2017)检索Swiss-Prot数据库2020_02版本(Bairoch & Apweiler, 2000),获取预测的全蛋白质功能信息。若比对结果同时满足以下条件,则认定其功能注释可靠:两条序列的覆盖度均≥70%、序列一致性≥60%、e值<1e-10。
使用InterProScan(P. Jones等, 2014; Mitchell等, 2019)预测蛋白质功能结构域。
此外,通过PANNZER(Koskinen等, 2015)与eggNOG-Mapper(Huerta-Cepas等, 2017)的在线服务器,预测基因本体术语(GO terms)及推定产物名称。
整合来自InterProScan、PANNZER及eggNOG-Mapper的基因本体术语预测结果,并进行过滤:移除GO数据库“do_not_annotate”的“anti-slim”集合中的术语(下载自http://geneontology.org/docs/download-ontology,下载时间:2020-05-15)以剔除无信息价值的术语,最终得到预测蛋白质组的基因本体术语集合。
使用Predector流程(https://github.com/ccdmb/predector,版本:0.1.0-alpha)(已提交)预测类效应蛋白序列,该流程整合了多款软件的分析工具,包括:SignalP 3.0、4.1g、5.0b版本(José Juan Almagro Armenteros等, 2019; Dyrløv Bendtsen等, 2004; Petersen等, 2011)、DeepSig(Savojardo等, 2018)、TargetP 2.0版本(Jose Juan Almagro Armenteros等, 2019)、DeepLoc 1.0版本(Almagro Armenteros等, 2017)、TMHMM 2.0c版本(Krogh等, 2001)、Phobius 1.01版本(Käll等, 2004)、EffectorP 1和2版本(Sperschneider等, 2016; Sperschneider、Dodds、Gardiner等, 2018)、ApoplastP 1版本(Sperschneider、Dodds、Singh等, 2018)、LOCALIZER(Sperschneider等, 2017);使用HMMER 3.3版本(Mistry等, 2013)针对dbCAN 8版本进行同源性检索,以及使用MMSeqs2版本11.e1a1c(Steinegger & Söding, 2017)针对PHI-base 4.9版本(Urban等, 2020)进行序列匹配比对。
将来自Predector、InterProScan、PANNZER、eggNOG-Mapper、正选择分析及直系同源群分析的信息整合为单一表格。
所有多成员直系同源群的原始CDS密码子比对文件、基因树文件及BUSTED分析结果也已包含在本数据集中。
提供机构:
figshare
创建时间:
2021-11-01



