Performance of STI predictive models with SMOTE.
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Performance_of_STI_predictive_models_with_SMOTE_/26167539
下载链接
链接失效反馈官方服务:
资源简介:
There is a substantial increase in sexually transmitted infections (STIs) among men who have sex with men (MSM) globally. Unprotected sexual practices, multiple sex partners, criminalization, stigmatisation, fear of discrimination, substance use, poor access to care, and lack of early STI screening tools are among the contributing factors. Therefore, this study applied multilayer perceptron (MLP), extremely randomized trees (ExtraTrees) and XGBoost machine learning models to predict STIs among MSM using bio-behavioural survey (BBS) data in Zimbabwe. Data were collected from 1538 MSM in Zimbabwe. The dataset was split into training and testing sets using the ratio of 80% and 20%, respectively. The synthetic minority oversampling technique (SMOTE) was applied to address class imbalance. Using a stepwise logistic regression model, the study revealed several predictors of STIs among MSM such as age, cohabitation with sex partners, education status and employment status. The results show that MLP performed better than STI predictive models (XGBoost and ExtraTrees) and achieved accuracy of 87.54%, recall of 97.29%, precision of 89.64%, F1-Score of 93.31% and AUC of 66.78%. XGBoost also achieved an accuracy of 86.51%, recall of 96.51%, precision of 89.25%, F1-Score of 92.74% and AUC of 54.83%. ExtraTrees recorded an accuracy of 85.47%, recall of 95.35%, precision of 89.13%, F1-Score of 92.13% and AUC of 60.21%. These models can be effectively used to identify highly at-risk MSM, for STI surveillance and to further develop STI infection screening tools to improve health outcomes of MSM.
全球范围内男男性行为者(men who have sex with men, MSM)的性传播感染(sexually transmitted infections, STIs)病例数大幅攀升。无保护性行为、多性伴、刑事定罪、污名化、歧视恐惧、物质使用、就医可及性差以及缺乏早期性传播感染筛查工具等均为相关诱因。为此,本研究基于津巴布韦男男性行为者的生物行为调查(bio-behavioural survey, BBS)数据,应用多层感知机(multilayer perceptron, MLP)、极端随机树(extremely randomized trees, ExtraTrees)与XGBoost三种机器学习模型,对男男性行为者的性传播感染风险进行预测。研究共纳入津巴布韦1538名男男性行为者的相关数据,并按80%训练集、20%测试集的比例划分数据集;同时采用合成少数类过采样技术(synthetic minority oversampling technique, SMOTE)处理类别不平衡问题。本研究通过逐步逻辑回归模型,识别出若干影响男男性行为者性传播感染的预测因子,包括年龄、与性伴同居状况、教育水平与就业状态。实验结果显示,多层感知机的预测性能优于XGBoost与极端随机树模型,其准确率达87.54%、召回率为97.29%、精确率为89.64%、F1值为93.31%、曲线下面积(Area Under Curve, AUC)为66.78%;XGBoost的准确率为86.51%、召回率96.51%、精确率89.25%、F1值92.74%、曲线下面积为54.83%;极端随机树的准确率为85.47%、召回率95.35%、精确率89.13%、F1值92.13%、曲线下面积为60.21%。上述模型可有效识别性传播感染高风险的男男性行为者,用于性传播感染监测,并助力开发更完善的性传播感染筛查工具,以改善男男性行为者的健康结局。
创建时间:
2024-07-03



