Optimizing machine learning-based diagnosis of neurosyphilis in HIV-negative patients: a multicenter, real-world comparison of international diagnostic criteria

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Optimizing_Machine_Learning_Based_Diagnosis_of_Neurosyphilis_in_HIV-Negative_Patients_A_Multicenter_Real-World_Comparison_of_International_Diagnostic_Criteria/31900984

下载链接

链接失效反馈

官方服务：

资源简介：

Neurosyphilis continues to rise globally, yet diagnosis remains challenging, often requiring multidisciplinary expertise and multiple CSF assays that are difficult to access in resource-limited settings. We developed and compared machine-learning (ML) models tailored to six international diagnostic guidelines and built a free, web-based tool for guideline-adapted decision support. We assembled 1,648 suspected neurosyphilis cases from four centers, using Guangzhou as the training cohort and Beijing, Xiamen (China) and Seattle (USA) as external validation cohorts. Five algorithms (Random Forest, AdaBoost, SVC, NuSVC, XGBoost) were trained with randomized search and three-fold cross-validation; performance was assessed by AUC, PRAUC, calibration, decision-curve net benefit, Brier score, and SHAP for model explainability. Across models and guidelines, neurological symptoms, CSF protein, and CSF white blood cell count consistently ranked as the strongest predictors. All models achieved excellent discrimination (AUC and PRAUC >0.90) with good calibration, reliability, and positive clinical utility, though performance varied modestly by guideline. These findings indicate that the optimal ML approach depends on the diagnostic definition applied. Our freely available online tool operationalizes these models to provide clinicians worldwide with context-adapted support aligned to local criteria.

神经梅毒（Neurosyphilis）在全球范围内发病率持续攀升，但诊断仍颇具挑战，往往需要多学科专业知识以及多种脑脊液（Cerebrospinal Fluid, CSF）检测手段，而在资源受限地区往往难以获取这些检测条件。本研究开发并对比了针对六项国际诊断指南定制的机器学习（Machine Learning, ML）模型，并搭建了一款免费的基于网页的指南适配决策支持工具。我们从四家医学中心收集了1648例疑似神经梅毒病例，以广州队列作为训练集，以北京、厦门（中国）及美国西雅图队列作为外部验证集。我们采用随机搜索与三折交叉验证对五种算法——随机森林（Random Forest）、自适应提升（AdaBoost）、支持向量分类器（SVC）、Nu支持向量分类器（NuSVC）、极限梯度提升树（XGBoost）——进行训练；模型性能通过受试者工作特征曲线下面积（AUC）、精度召回曲线下面积（PRAUC）、校准度、决策曲线净获益、布里尔分数（Brier score）以及SHAP可解释性分析进行评估。在不同模型与指南中，神经系统症状、脑脊液蛋白水平以及脑脊液白细胞计数始终位列最强预测因子之列。所有模型均实现了优异的区分能力（AUC与PRAUC均大于0.90），且校准度、可靠性与临床正向效用良好，不过不同指南下的性能存在小幅差异。上述结果表明，最优机器学习方案取决于所采用的诊断定义。我们免费开放的在线工具可落地运行这些模型，为全球临床医生提供符合当地标准的场景适配决策支持。

创建时间：

2026-03-31