five

Orthogroup functional annotations and effector candidates

收藏
DataCite Commons2021-08-16 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Orthogroup_functional_annotations_and_effector_candidates/12966971/1
下载链接
链接失效反馈
官方服务:
资源简介:
Functional annotations of all predicted proteins in a <i>Parastagonospora nodorum</i> pangenome.<br>Orthogroups were predicted using OrthoFinder, each gene is annotated with the P. nodorum isolate that it is predicted in and the orthogroup that it belongs to.<br>Predicted whole protein functions were found by searching the Swiss-Prot database version 2020_02 (Bairoch &amp; Apweiler, 2000) using MMSeqs2 version 11-e1a1c (--start-sens 3 -s 7.5 --sens-steps 3 -a) (Steinegger &amp; Söding, 2017). Matches were considered reliable for functional annotation if they covered &gt;= 70% of both sequences, with &gt;=60% sequence identity, and an e-value &lt; 1e-10. Functional domains were predicted using InterProScan (P. Jones et al., 2014; Mitchell et al., 2019). Additionally, GO-terms and predicted product names were predicted using the web-servers of PANNZER (Koskinen et al., 2015) and eggNOG-Mapper (Huerta-Cepas et al., 2017). GO-term predictions from InterProScan, PANNZER, and eggNOG-Mapper were combined and filtered to exclude terms in the GO do_not_annotate “anti-slim” set (available at: http://geneontology.org/docs/download-ontology, downloaded: 2020-05-15) to remove uninformative terms, forming the final GO-term set for the predicted proteomes.<br>Effector-like sequences were predicted using the Predector pipeline (https://github.com/ccdmb/predector, version: 0.1.0-alpha) (Submitted), which incorporates several software analyses including SignalP versions 3.0, 4.1g, 5.0b (José Juan Almagro Armenteros et al., 2019; Dyrløv Bendtsen et al., 2004; Petersen et al., 2011), DeepSig (Savojardo et al., 2018), TargetP version 2.0 (Jose Juan Almagro Armenteros et al., 2019), DeepLoc version 1.0 (Almagro Armenteros et al., 2017), TMHMM version 2.0c (Krogh et al., 2001), Phobius version 1.01 (Käll et al., 2004), EffectorP versions 1 and 2 (Sperschneider et al., 2016; Sperschneider, Dodds, Gardiner, et al., 2018), ApoplastP version 1 (Sperschneider, Dodds, Singh, et al., 2018), LOCALIZER (Sperschneider et al., 2017), homology searches against dbCAN version 8 using HMMER version 3.3 (Mistry et al., 2013), and sequence matches against PHI-base version 4.9 (Urban et al., 2020) using MMSeqs2 version 11.e1a1c (Steinegger &amp; Söding, 2017).Information from Predector, InterProScan, Pannzer, eggNOG-mapper, positive selection and orthogroup analyses were combined into a single table.

本数据集为小麦颖枯病菌(Parastagonospora nodorum)泛基因组内所有预测蛋白的功能注释集。研究通过OrthoFinder工具预测得到直系同源基因簇(Orthogroups),每条基因均标注了其所在的P. nodorum分离株信息,以及其所归属的直系同源基因簇。预测的全蛋白功能通过检索2020_02版Swiss-Prot数据库(Bairoch & Apweiler, 2000)获得,检索工具为版本11-e1a1c的MMSeqs2,检索参数设置为`--start-sens 3 -s 7.5 --sens-steps 3 -a`(Steinegger & Söding, 2017)。当序列比对覆盖两条序列的比例均≥70%、序列一致性≥60%且e值小于1e-10时,该比对结果被认定为可用于功能注释的可靠匹配。功能结构域通过InterProScan工具进行预测(P. Jones et al., 2014; Mitchell et al., 2019)。此外,本数据集还通过PANNZER在线服务(Koskinen et al., 2015)与eggNOG-Mapper在线服务(Huerta-Cepas et al., 2017)分别预测了基因本体(Gene Ontology, GO)条目与预测产物名称。将InterProScan、PANNZER及eggNOG-Mapper所得到的GO条目预测结果进行合并与过滤,移除GO数据库中`do_not_annotate`("anti-slim")集合内的无信息注释条目(该集合可通过http://geneontology.org/docs/download-ontology获取,下载日期为2020年5月15日),最终得到预测蛋白组的GO注释集合。效应子样序列通过Predector分析流程(https://github.com/ccdmb/predector,版本号:0.1.0-alpha,已投稿)进行预测,该流程整合了多款生物信息学工具的分析结果,包括:SignalP 3.0、4.1g、5.0b版本(José Juan Almagro Armenteros et al., 2019; Dyrløv Bendtsen et al., 2004; Petersen et al., 2011)、DeepSig工具(Savojardo et al., 2018)、TargetP 2.0版本(Jose Juan Almagro Armenteros et al., 2019)、DeepLoc 1.0版本(Almagro Armenteros et al., 2017)、TMHMM 2.0c版本(Krogh et al., 2001)、Phobius 1.01版本(Käll et al., 2004)、EffectorP 1与2版本(Sperschneider et al., 2016; Sperschneider, Dodds, Gardiner, et al., 2018)、ApoplastP 1版本(Sperschneider, Dodds, Singh, et al., 2018)、LOCALIZER工具(Sperschneider et al., 2017);通过HMMER 3.3版本对dbCAN 8版本数据库进行同源性检索;以及通过版本11.e1a1c的MMSeqs2工具对PHI-base 4.9版本数据库进行序列匹配检索(Urban et al., 2020; Steinegger & Söding, 2017)。将Predector、InterProScan、PANNZER、eggNOG-Mapper、正选择分析以及直系同源基因簇分析得到的所有信息整合为一张综合表格。
提供机构:
figshare
创建时间:
2020-11-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作