five

Table_1_Prediction of celiac disease associated epitopes and motifs in a protein.docx

收藏
frontiersin.figshare.com2023-06-21 更新2025-01-21 收录
下载链接:
https://frontiersin.figshare.com/articles/dataset/Table_1_Prediction_of_celiac_disease_associated_epitopes_and_motifs_in_a_protein_docx/21921342/1
下载链接
链接失效反馈
官方服务:
资源简介:
IntroductionCeliac disease (CD) is an autoimmune gastrointestinal disorder causes immune-mediated enteropathy against gluten. Gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2/DQ8 are major alleles that bind to epitope/antigenic region of gluten and induce celiac disease. There is a need to identify CD associated epitopes in protein-based foods and therapeutics.MethodsIn this study, computational tools have been developed to predict CD associated epitopes and motifs. Dataset used for training, testing and evaluation contain experimentally validated CD associated and non-CD associate peptides. We perform positional analysis to identify the most significant position of an amino acid residue in the peptide and checked the frequency of HLA alleles. We also compute amino acid composition to develop machine learning based models. We also developed ensemble method that combines motif-based approach and machine learning based models.Results and DiscussionOur analysis support existing hypothesis that proline (P) and glutamine (Q) are highly abundant in CD associated peptides. A model based on density of P&Q in peptides has been developed for predicting CD associated peptides which achieve maximum AUROC 0.98 on independent data. We discovered motifs (e.g., QPF, QPQ, PYP) which occurs specifically in CD associated peptides. We also developed machine learning based models using peptide composition and achieved maximum AUROC 0.99. Finally, we developed ensemble method that combines motif-based approach and machine learning based models. The ensemble model-predict CD associated motifs with 100% accuracy on an independent dataset, not used for training. Finally, the best models and motifs has been integrated in a web server and standalone software package “CDpred”. We hope this server anticipate the scientific community for the prediction, designing and scanning of CD associated peptides as well as CD associated motifs in a protein/peptide sequence (https://webs.iiitd.edu.in/raghava/cdpred/).

引言:乳糜泻(CD)是一种自身免疫性胃肠道疾病,其特征为针对麸质产生的免疫介导性肠病。麸质免疫原性肽具有引发免疫反应的潜力,进而损害小肠。HLA-DQ2/DQ8是主要等位基因,它们与麸质的表位/抗原区域结合并诱导乳糜泻。有必要在基于蛋白质的食物和疗法中识别与CD相关的表位。方法:在本研究中,开发了计算工具以预测与CD相关的表位和基序。用于训练、测试和评估的数据集包含经实验验证的与CD相关和非CD相关肽。我们进行定位分析以确定肽中氨基酸残基的最显著位置并检查HLA等位基因的频率。我们还计算氨基酸组成以开发基于机器学习的模型。此外,我们还开发了结合基序方法与基于机器学习模型的集成方法。结果与讨论:我们的分析支持了现有假设,即脯氨酸(P)和谷氨酰胺(Q)在CD相关肽中高度丰富。基于肽中P和Q密度的模型已开发用于预测CD相关肽,在独立数据上实现了最大AUROC 0.98。我们发现了一些特定出现在CD相关肽中的基序(例如,QPF、QPQ、PYP)。我们还开发了基于肽组成的机器学习模型,实现了最大AUROC 0.99。最后,我们开发了结合基序方法与机器学习模型的集成方法。该集成模型在未用于训练的独立数据集上预测CD相关基序,准确率达到100%。最终,最佳模型和基序已集成到网络服务器和独立软件包“CDpred”中。我们希望该服务器能够满足科学界对预测、设计和扫描蛋白质/肽序列中与CD相关肽以及CD相关基序的需求。
提供机构:
Frontiers
二维码
社区交流群
二维码
科研交流群
商业服务