five

Predictive Stool-Based Protein Biomarkers for the Classification of Crohn's Disease and Ulcerative Colitis Using a Machine Learning Approach

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.omicsdi.org/dataset/pride/PXD057120
下载链接
链接失效反馈
官方服务:
资源简介:
Background and Aim: Crohn's disease (CD) and ulcerative colitis (UC) are the two major chronic inflammatory bowel diseases (IBD). Although their symptoms are similar, their pathological features and clinical treatments differ. Currently, distinguishing between these diseases involves invasive procedures such as colonoscopy and histopathology, causing discomfort and inconvenience to patients. The use of fecal proteins as non-invasive biomarkers offers a promising alternative due to their stability and proximity to inflamed tissues. This study focuses on using high-throughput data-independent acquisition (DIA) mass spectrometry to develop accurate biomarker signatures from complex stool samples. Methods: Stool samples obtained from 46 active CD patients and 23 active UC patients were analyzed. Using DIA-based SWATH mass spectrometry, we explored the stool proteome, identifying and quantifying approximately 1,250 proteins. The samples were divided into training and testing groups. After data processing, various feature selection algorithms were applied on training group to determine proteins that were significantly different between the CD and UC groups. Additionally, six machine learning algorithms including k-Nearest Neighbors, Naive Bayes, eXtreme Gradient Boosting, Random Forest, Support Vector Machine, and glmnet were evaluated to identify the best-performing classifiers. Results: Sixteen proteins were selected based of several feature selection algorithms and the six ML models trained based on them. According to performance metrics of each algorithm on the training dataset, Naïve Bayes model was selected. For performance validation, the final predictive model was applied to 16 prospective samples as the test dataset. Remarkably, the model achieved an AUC of 0.95 on training dataset and AUC of 0.96 on the test dataset, demonstrating its robustness and lack of overfitting. Conclusion: This study demonstrates the effectiveness of SWATH-based proteomics and machine learning in developing predictive models to classify CD and UC. Further future validation on a larger cohort using targeted MRM mass spectrometry would be served to establish the clinical utility and reliability of this approach.
创建时间:
2025-12-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作