Predictive Stool-Based Protein Biomarkers for the Classification of Crohn's Disease and Ulcerative Colitis Using a Machine Learning Approach

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://www.omicsdi.org/dataset/pride/PXD057120

下载链接

链接失效反馈

官方服务：

资源简介：

Background and Aim: Crohn's disease (CD) and ulcerative colitis (UC) are the two major chronic inflammatory bowel diseases (IBD). Although their symptoms are similar, their pathological features and clinical treatments differ. Currently, distinguishing between these diseases involves invasive procedures such as colonoscopy and histopathology, causing discomfort and inconvenience to patients. The use of fecal proteins as non-invasive biomarkers offers a promising alternative due to their stability and proximity to inflamed tissues. This study focuses on using high-throughput data-independent acquisition (DIA) mass spectrometry to develop accurate biomarker signatures from complex stool samples. Methods: Stool samples obtained from 46 active CD patients and 23 active UC patients were analyzed. Using DIA-based SWATH mass spectrometry, we explored the stool proteome, identifying and quantifying approximately 1,250 proteins. The samples were divided into training and testing groups. After data processing, various feature selection algorithms were applied on training group to determine proteins that were significantly different between the CD and UC groups. Additionally, six machine learning algorithms including k-Nearest Neighbors, Naive Bayes, eXtreme Gradient Boosting, Random Forest, Support Vector Machine, and glmnet were evaluated to identify the best-performing classifiers. Results: Sixteen proteins were selected based of several feature selection algorithms and the six ML models trained based on them. According to performance metrics of each algorithm on the training dataset, Naïve Bayes model was selected. For performance validation, the final predictive model was applied to 16 prospective samples as the test dataset. Remarkably, the model achieved an AUC of 0.95 on training dataset and AUC of 0.96 on the test dataset, demonstrating its robustness and lack of overfitting. Conclusion: This study demonstrates the effectiveness of SWATH-based proteomics and machine learning in developing predictive models to classify CD and UC. Further future validation on a larger cohort using targeted MRM mass spectrometry would be served to establish the clinical utility and reliability of this approach.

创建时间：

2025-12-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集