Stability-Based Machine Learning Identifies a Minimal Two-Protein Serum Signature for Early Silicosis
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Stability-Based_Machine_Learning_Identifies_a_Minimal_Two-Protein_Serum_Signature_for_Early_Silicosis/31942171
下载链接
链接失效反馈官方服务:
资源简介:
Background: The early diagnosis of silicosis, an irreversible
fibrotic
lung disease, is challenged by the low sensitivity of current radiological
methods in early-stage disease and their susceptibility to interobserver
variability. Consequently, a pressing need exists for noninvasive,
objective biomarkers to facilitate timely detection and intervention.
Methods: We employed a multistage study design comprising a discovery
cohort (57 Stage I silicosis patients, 57 matched controls) and an
independent, unmatched validation cohort (40 patients, 40 controls).
Serum protein profiles were generated using Olink targeted proteomics.
We utilized a rigorous, stability-based machine learning framework,
which integrated Lasso, Random Forest, and SVM-RFE algorithms over
100 iterations, to perform feature selection and identify a robust
biomarker signature from the discovery cohort. Based on the selected
features, a logistic regression model was subsequently constructed,
and its performance was evaluated using both internal and external
validation. Results: Our discovery strategy identified a two-protein
signature comprising IL8 and CCL3. This signature demonstrated excellent
diagnostic performance in the discovery cohort, achieving a cross-validation
AUC of 0.986 (95% CI: 0.975–1.000). Importantly, the model’s
robustness was confirmed in the heterogeneous validation cohort, where
it achieved an outstanding AUC of 0.973 (95% CI: 0.936–1.000),
with 95.0% specificity and 77.5% sensitivity. Bioinformatic analysis
revealed that decreased serum levels of IL8 and CCL3 were associated
with silicosis, providing novel diagnostic biomarkers and highlighting
a complex, paradoxical shift in circulating chemokines during early-stage
disease.
背景:硅肺是一种不可逆的纤维化肺部疾病,当前放射学方法在早期疾病诊断中敏感性较低,且易受观察者间变异影响,因此早期诊断面临诸多挑战。故而,亟需开发无创、客观的生物标志物,以实现硅肺的及时检测与干预。
方法:本研究采用多阶段研究设计,包含发现队列(57例I期硅肺患者与57例匹配对照)以及独立非匹配验证队列(40例患者与40例对照)。通过Olink靶向蛋白质组学(Olink targeted proteomics)技术获取血清蛋白质谱。我们采用基于稳定性评估的严谨机器学习框架,在100次迭代中整合Lasso、随机森林(Random Forest)与支持向量机递归特征消除(SVM-RFE)算法,从发现队列中完成特征筛选并识别出稳健的生物标志物特征。基于筛选得到的特征,后续构建了逻辑回归模型,并通过内部与外部验证评估其诊断性能。
结果:本研究的发现策略筛选出包含白细胞介素8(IL8)与趋化因子配体3(CCL3)的双蛋白特征。该特征在发现队列中展现出优异的诊断性能,交叉验证曲线下面积(AUC)达0.986(95%置信区间:0.975–1.000)。值得注意的是,该模型的稳健性在异质性验证队列中得到证实,其AUC达0.973(95%置信区间:0.936–1.000),特异性为95.0%,敏感性为77.5%。生物信息学分析显示,血清IL8与CCL3水平降低与硅肺发病相关,本研究为早期硅肺诊断提供了新型生物标志物,并揭示了早期疾病阶段循环趋化因子存在复杂且矛盾的表达变化。
创建时间:
2026-04-06



