five

Soft-Label Machine Learning for Verbal Autopsy: Incorporating Diagnostic Uncertainty in Cause-of-Death Estimation

收藏
DataCite Commons2026-05-06 更新2026-05-09 收录
下载链接:
https://borealisdata.ca/citation?persistentId=doi:10.5683/SP3/PKWXT4
下载链接
链接失效反馈
官方服务:
资源简介:
<br/><strong>Objective:</strong> Verbal autopsy (VA) is widely used to infer causes of death (COD) in settings where medical certification and autopsy are unavailable. Two physician-certified COD determination after review of verbal autopsy is commonly used, but it is time- and effort-intensive, and often cloaked in diagnostic uncertainty. We introduce a framework for soft-label (probabilistic) supervised VA modelling that accommodates physician diagnostic uncertainty in settings where gold-standard labels are unavailable, and compare this to a traditional hard-label (deterministic) approach.<br /> <br /><strong>Materials and Methods:</strong> Using pediatric post-discharge mortality VA data (N = 356) from a multisite cohort study in Uganda, we constructed soft labels (probability vectors) from primary and alternative physician-assigned causes, along with associated confidence levels. XGBoost was used to create and compare models using soft-labels vs. hard-labels derived from the primary cause of death. <br/> <br /><strong>Results:</strong> The soft-label model had population-level performance, as assessed by the cause-specific mortality fraction (CSMF) accuracy, of 0.88 ± 0.04, compared with 0.78 ± 0.03 for hard labels. Individual-level performance was slightly lower for soft labels with a chance-corrected concordance of 0.38 ± 0.05 vs 0.40 ± 0.06.<br/> <br /><strong>Discussion:</strong> By explicitly modelling physician uncertainty, soft-label learning substantially improves population-level cause-specific mortality estimation derived from VA data. Individual-level concordance remains limited, reflecting data constraints and the nature of VA. The proposed framework with robust CSMF estimates is well-suited for mortality surveillance and policy-relevant applications in low-resource settings. <br/> <br /><strong>Conclusion:</strong> Incorporating uncertainty via soft-labels provides a principled alternative to using hard-labels when diagnosing the known cause of death is infeasible. <br/> <br /><strong>Data Collection Methods:</strong> All data were collected at the point of care using encrypted study tablets and these data were then uploaded to a Research Electronic Data Capture (REDCap) database hosted at the BC Children’s Hospital Research Institute (Vancouver, Canada). At admission, trained study nurses systematically collected data on clinical, social and demographic variables. Following discharge, field officers contacted caregivers at 2 and 4 months by phone, and in-person at 6 months, to determine vital status, post-discharge health-seeking, and readmission details. Verbal autopsies were conducted for children who had died following discharge. Each VA case was independently reviewed by two physicians. <br/> <br /><strong>Data Processing Methods:</strong>For this analysis, selected features from the verbal autopsy data, physician assessments for verbal autopsies, and clinical features of the deceased from both cohorts (0-6 months and 6-60 months) were combined and analyzed as a single dataset. The code was developed in JupyterLab, and the analysis was conducted in Python 3.12.0. <br/> <br /><strong>Ethics Declaration:</strong> Ethics approval of the prospective, multisite, observational cohort study was obtained from the Mbarara University of Science and Technology Research Ethics Committee (15/10-16), the Uganda National Institute of Science and Technology (HS 2207), and the University of British Columbia-Children and Women’s Health Centre of British Columbia Research Ethics Board (H16-02679). <br/> <br /><strong> Funding Source(s):</strong> This study was funded in part by Mitacs through the Mitacs Accelerate program (Project IT47298), which provided internship funding support for Manav Doshi. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. <br/>
提供机构:
Borealis
创建时间:
2026-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作