five

Predector - supplementary material

收藏
DataCite Commons2020-12-03 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Predector_-_supplementary_material/13325213
下载链接
链接失效反馈
官方服务:
资源简介:
All supplementary material and full resolution figures for the Predector pipeline manuscript.<br><b>Figure 1:</b> UpSet plot showing predictions of signal peptides, transmembrane domains, and effector-like properties for all known effectors in the training dataset (N=125). Rows indicate sets of proteins predicted to have a property related to effector prediction (e.g. a signal peptide), with the horizontal bar chart indicating set size. Columns indicate where the horizontal sets intersect with each other, where the vertical bar-chart indicates the number of proteins in that intersection. For clarity, intersections with only 1 member have been excluded, the full plot is presented in supplementary figure 1.<br><br><b>Figure 2:</b> A violin plot showing the distributions of Predector effector ranking scores for each class in the test and training datasets. The effectors consist of experimentally validated fungal effector sequences. “Secreted” and “non-secreted” proteins are manually annotated proteins from the SwissProt database. Proteomes consist of the complete predicted proteomes from 10 well studied fungi (Supplementary table 2). The number of proteins represented by each violin are indicated on the x-axis.<br><br><b>Figure 3:</b> Comparing the scores of Predector with EffectorP versions 1 and 2 for proteins in the testing dataset. Scatter plots in the lower-left corner indicate comparisons of predictive scores between methods, with predicted secreted proteins (any signal peptide and fewer than two TM domains predicted) indicated in yellow, and non-secreted proteins indicated in blue. Density plots along the diagonal indicate distributions of the full test dataset versus predictive scores for each method (indicated along the x-axis), also coloured by secretion prediction as before (Note: there are far more non-secreted than secreted proteins in the dataset). Scatter plots in the top-right corner indicate score comparisons between methods for confirmed effectors, coloured by whether they have been predicted as secreted (criteria as above), or additionally predicted by EffectorP versions 1 or 2. Two proteins that are misclassified by a Predector score &gt; 0 are labelled in the top-right subplot.<br><br><b>Supplementary Table 1</b>: Examples of confirmed fungal plant pathogenicity effector proteins that do not exhibit the commonly targeted protein properties of low-molecular weight, cysteine-richness and presence of classical N-terminal secretion signal peptide.<br><b>Supplementary Table 2</b>. Datasets used for training and evaluation.<br><b>Supplementary Table 3</b>. Weights assigned for manual scores. Description of parameters used to calculate combined Predector scores, based on weight-adjusted values. Individual scores were determined by multiplying the value by weight, and the combined Predector score was calculated from the sum of all individual scores.<br><b>Supplementary Table 4</b>. Extended model evaluation and statistics.<br><br>The supplementary figures document contains its own documentation.<br><br>

本数据集为《Predector流程》手稿的全部补充材料与高分辨率原图。<br><b>图1:</b>本数据集训练数据集(training dataset,N=125)内全部已知效应蛋白(effector)的信号肽(signal peptide)、跨膜结构域(transmembrane domain)与类效应蛋白特性预测结果的UpSet图(UpSet plot)。行代表被预测出与效应蛋白预测相关特性的蛋白质集合(如信号肽),横向条形图展示集合大小;列代表各横向集合的交集区域,纵向条形图展示该交集内的蛋白质数量。为便于展示,仅包含1个成员的交集已被剔除,完整原图见补充图1。<br><br><b>图2:</b>小提琴图(violin plot)展示了测试集与训练数据集内各分类的Predector效应蛋白排序评分分布。此处的效应蛋白均为经实验验证的真菌效应蛋白序列。“分泌型”与“非分泌型”蛋白质为来自SwissProt数据库的人工注释蛋白;蛋白质组(proteome)则包含10种已被充分研究的真菌的全部预测蛋白质组,详情见补充表2。每个小提琴图代表的蛋白质数量已标注于横轴。<br><br><b>图3:</b>对比了测试数据集内蛋白质的Predector评分与EffectorP v1/v2评分。左下角的散点图展示不同方法的预测评分对比:预测为分泌型的蛋白质(即含有信号肽且预测TM结构域少于2个)以黄色标注,非分泌型蛋白质以蓝色标注。对角线上的密度图展示全部测试数据集的分布与各方法的预测评分分布(横轴标注对应方法),同样按分泌预测结果着色(注:数据集中非分泌型蛋白质数量远多于分泌型蛋白质)。右上角的散点图则展示已确认效应蛋白的不同方法评分对比,按是否被预测为分泌型(标准同上)或是否被EffectorP v1/v2额外预测着色。右上角子图中标注了2个被Predector评分>0误分类的蛋白质。<br><br><b>补充表1:</b>未具备低分子量、富含半胱氨酸及经典N端分泌信号肽这些常见靶向蛋白特性的已确认真菌植物致病性效应蛋白示例。<br><b>补充表2:</b>训练与评估所用数据集。<br><b>补充表3:</b>人工评分权重分配表。基于权重调整值计算综合Predector评分的参数说明:单个评分通过数值与权重相乘得到,综合Predector评分由所有单个评分求和得出。<br><b>补充表4:</b>扩展的模型评估与统计结果。<br><br>补充配图文档自带配套说明文件。
提供机构:
figshare
创建时间:
2020-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作