Data_Sheet_1_Predictive models for small-for-gestational-age births in women exposed to pesticides before pregnancy based on multiple machine learning algorithms.ZIP
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Predictive_models_for_small-for-gestational-age_births_in_women_exposed_to_pesticides_before_pregnancy_based_on_multiple_machine_learning_algorithms_ZIP/20446560
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundThe association between prenatal pesticide exposures and a higher incidence of small-for-gestational-age (SGA) births has been reported. No prediction model has been developed for SGA neonates in pregnant women exposed to pesticides prior to pregnancy.
MethodsA retrospective cohort study was conducted using information from the National Free Preconception Health Examination Project between 2010 and 2012. A development set (n = 606) and a validation set (n = 151) of the dataset were split at random. Traditional logistic regression (LR) method and six machine learning classifiers were used to develop prediction models for SGA neonates. The Shapley Additive Explanation (SHAP) model was applied to determine the most influential variables that contributed to the outcome of the prediction.
Results757 neonates in total were analyzed. SGA occurred in 12.9% (n = 98) of cases overall. With an area under the receiver-operating-characteristic curve (AUC) of 0.855 [95% confidence interval (CI): 0.752–0.959], the model based on category boosting (CatBoost) algorithm obtained the best performance in the validation set. With the exception of the LR model (AUC: 0.691, 95% CI: 0.554–0.828), all models had good AUCs. Using recursive feature elimination (RFE) approach to perform the feature selection, we included 15 variables in the final model based on CatBoost classifier, achieving the AUC of 0.811 (95% CI: 0.675–0.947).
ConclusionsMachine learning algorithms can develop satisfactory tools for SGA prediction in mothers exposed to pesticides prior to pregnancy, which might become a tool to predict SGA neonates in the high-risk population.
研究背景:产前农药暴露与小于胎龄儿(small-for-gestational-age, SGA)出生率升高之间的关联已有相关报道。目前尚无针对孕前接触农药的孕妇所产小于胎龄儿的预测模型。
研究方法:本研究基于2010至2012年国家免费孕前健康检查项目(National Free Preconception Health Examination Project)的数据开展了一项回顾性队列研究。本数据集按随机方式划分为训练集(n=606)与验证集(n=151)。本研究采用传统逻辑回归(logistic regression, LR)方法与6种机器学习分类器构建小于胎龄儿的预测模型,并采用沙普利可加解释(Shapley Additive Explanation, SHAP)模型识别对预测结果影响最大的变量。
研究结果:本研究共纳入757例新生儿进行分析。总体中12.9%(n=98)的新生儿为小于胎龄儿。基于分类提升(category boosting, CatBoost)算法构建的模型在验证集中表现最优,其受试者工作特征曲线下面积(area under the receiver-operating-characteristic curve, AUC)为0.855[95%置信区间(confidence interval, CI):0.752–0.959]。除逻辑回归模型(AUC=0.691,95%CI:0.554–0.828)外,其余模型的AUC均表现良好。本研究采用递归特征消除(recursive feature elimination, RFE)方法进行特征选择,最终基于CatBoost分类器构建的模型纳入15个变量,其AUC可达0.811(95%CI:0.675–0.947)。
研究结论:机器学习算法可构建用于孕前接触农药的孕妇所产小于胎龄儿预测的可靠工具,有望成为高危人群中小于胎龄儿的预测手段。
创建时间:
2022-08-08



