SVM-Based Prediction of Propeptide Cleavage Sites in Spider Toxins Identifies Toxin Innovation in an Australian Tarantula

NIAID Data Ecosystem2026-03-07 收录

下载链接：

https://figshare.com/articles/dataset/_SVM_Based_Prediction_of_Propeptide_Cleavage_Sites_in_Spider_Toxins_Identifies_Toxin_Innovation_in_an_Australian_Tarantula_/749897

下载链接

链接失效反馈

官方服务：

资源简介：

Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree) and developed an algorithm (SpiderP) for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM) framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor) from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP) is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.html), a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from the SpiderP website.

蜘蛛神经毒素常被用作药理学研究工具，同时也是兼具治疗与农用化学开发潜力的新型化合物的热门来源。由于毒液肽天生具有毒性，宿主蜘蛛必须在毒液使用前采取策略以规避自身受到的不良影响。正因如此，绝大多数蜘蛛毒素会编码一段保护性前肽区，该区域可在酶切过程中从成熟肽上被切除。为了无需依赖蛋白质测序即可直接从毒素转录本中鉴定成熟毒素序列，必须通过生物信息学方法预测毒素前体中的前肽酶切位点。本研究评估了多种机器学习策略（支持向量机、隐马尔可夫模型与决策树），并开发了一款用于预测蜘蛛毒素前肽酶切位点的算法（SpiderP）。该策略采用结合了局部与全局序列信息的支持向量机（SVM）框架，在蜘蛛毒素前肽序列预测任务中性能优于或可媲美现有工具。在由已知毒素序列组成的独立测试集上对该SVM方法进行评估，结果显示其灵敏度达96%、特异性达100%。此外，我们从澳大利亚捕鸟蛛Selenotypus plumipes的毒液中测序得到5条未用于训练最终预测模型的新型肽，以验证预测器的准确性，结果显示其灵敏度为80%、8聚体特异性达99.6%。最后，我们结合该预测器与同源信息，从S. plumipes深度测序的毒液腺转录组中预测并鉴定了7组新型毒素，这一结果揭示了毒素进化过程中的结构复杂性与演化创新。该前体预测工具（SpiderP）可在蜘蛛毒素综合关系型数据库的网络门户ArachnoServer（http://www.arachnoserver.org/spiderP.html）上免费获取。所有训练数据集、测试数据集及所用脚本均可从SpiderP官网获取。

创建时间：

2013-07-22