Criteria for comparing modelling approaches.
收藏Figshare2025-05-29 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Criteria_for_comparing_modelling_approaches_/29188323
下载链接
链接失效反馈官方服务:
资源简介:
Accurate mapping and disaggregation of key health and demographic risk factors have become increasingly important for disease surveillance, which can reveal geographical social inequalities for improved health interventions and for monitoring progress on relevant Sustainable Development Goals (SDGs). Household surveys like the Demographic and Health Surveys have been widely used as a proxy for mapping SDG-related household characteristics. However, there is no consensus on the workflow to be used, and different methods have been implemented with varying complexities. This study aims to compare multiple modelling frameworks to model indicators of human vulnerability to malaria (SDG Target 3.3) in Senegal. These indicators were categorised into socioeconomic (e.g., stunting prevalence, wealth index) and malaria prevention indicators (e.g., indoor residual spraying, insecticide-treated net ownership). We compared three categories of the commonly used methods: (1) spatial interpolation methods (i.e., inverse distance weighting, thin plate splines, kriging), (2) ensemble methods (i.e., random forest), and (3) Bayesian geostatistical models. Most indicators could be modelled with medium to high predictive accuracy, with R2 values ranging from 0.40 to 0.86. No method or method category emerged as the best, but performance varied widely. Overall, socioeconomic indicators were generally better predicted by covariate-based models (e.g., random forest and Bayesian models), while methods using spatial autocorrelation alone (e.g., thin plate splines) performed better for variables with heterogeneous spatial structure, such as ethnicity and malaria prevention indicators. Increasing the complexity of the models did not always improve predictive performance, e.g., thin plate splines sometimes outperformed random forest or Bayesian geostatistical models. Beyond performance, we compared the different methods using other criteria (e.g., the ability to constrain the prediction range or to quantify prediction uncertainty) and discussed their implications for selecting a modelling approach tailored to the needs of the end user.
精准开展关键健康与人口风险因子的空间制图与细分统计,在疾病监测领域的重要性与日俱增——此类工作可揭示地理维度的社会不平等,助力优化健康干预策略,并监督相关可持续发展目标(Sustainable Development Goals, SDGs)的推进进展。诸如人口与健康调查(Demographic and Health Surveys)这类家庭调查,已被广泛用作表征与可持续发展目标相关家庭特征的替代数据源。然而,当前尚未形成统一的标准化工作流程,不同方法的实现复杂度亦存在显著差异。本研究旨在对比多种建模框架,以模拟塞内加尔境内人类疟疾易感性相关指标(可持续发展目标3.3,SDG Target 3.3)。此类指标被划分为社会经济类指标(如发育迟缓患病率、财富指数)与疟疾防控类指标(如室内残留喷洒、经杀虫剂处理蚊帐持有率)。本次研究对比了三类常用建模方法:(1) 空间插值方法,即反距离加权插值法(inverse distance weighting)、薄板样条插值法(thin plate splines)与克里金插值法(kriging);(2) 集成学习方法,即随机森林(random forest);(3) 贝叶斯地统计模型(Bayesian geostatistical models)。多数指标可实现中等至较高的预测精度,决定系数(R²)的取值范围为0.40至0.86。尚无最优方法或方法类别,且各方法的预测性能差异悬殊。总体而言,基于协变量的模型(如随机森林与贝叶斯模型)对社会经济类指标的预测效果更佳;而仅依托空间自相关的方法(如薄板样条插值法),则在处理族群、疟疾防控指标这类具有异质性空间结构的变量时表现更优。提升模型复杂度并不总能改善预测性能,例如薄板样条插值法有时便优于随机森林或贝叶斯地统计模型。除预测性能外,本研究还通过其他标准对不同方法进行了对比,例如约束预测范围的能力、量化预测不确定性的能力,并探讨了如何根据终端用户的需求选择适配的建模方案。
创建时间:
2025-05-29



