Predicting COVID-19 vaccine uptake among Nigerian women using a Supervised Machine Learning approach: Insight from the 2024 Demographic and Health Survey
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Predicting_COVID-19_vaccine_uptake_among_Nigerian_women_using_a_Supervised_Machine_Learning_approach_Insight_from_the_2024_Demographic_and_Health_Survey/31807294
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains nationally representative individual-level data extracted and harmonized from the 2024 Nigeria Demographic and Health Survey (NDHS). It was specifically curated to support the analysis presented in the manuscript investigating predictors of COVID-19 vaccine uptake among women of reproductive age (15–49 years) using supervised machine learning techniques.
The dataset comprises 36,161 observations and 43 variables, including socio-demographic characteristics, reproductive history, access to health services, media exposure, COVID-19-related knowledge and symptoms, and behavioral factors. The primary outcome variable is COVID-19 vaccine uptake (Outcome_CovidVX), coded as a binary indicator.
The NDHS employs a two-stage stratified cluster sampling design to ensure national representativeness across Nigeria’s geopolitical regions. Sampling weights (variable wgt) were incorporated to account for unequal probabilities of selection and non-response. For this study, relevant variables were selected based on prior literature and theoretical relevance to vaccine uptake behavior. Data preprocessing steps included:
Data cleaning and recoding: Standard DHS variables (e.g., education, wealth index, residence) were transformed into analytically meaningful categories.Handling missing data: Observations with substantial missingness were excluded, while minimal missing values were handled using appropriate imputation strategies.Feature engineering: Composite and transformed variables (e.g., EduHusbTrans, FamsizeTrans) were generated to enhance predictive performance.Categorical encoding: Variables were encoded using label or one-hot encoding as required for machine learning models.Analytical approaches
Supervised machine learning models were applied to predict COVID-19 vaccine uptake. The modeling pipeline included:
Algorithms used: Logistic Regression(LR), Decision Tree(DT), Random Forest(RF), Gradient Boosting, Extreme Gradient Boosting, CatBoost, Support Vector Machines(SVM), K-Nearest Neighbors(KNN), and Artificial Neural Networks(ANN).Model training and validation: Data were split into training (80%) and testing (20%) sets, with cross-validation applied to optimize model performance.Performance evaluation: Metrics such as accuracy, precision, recall, F1-score, and Area Under the ROC Curve (AUC) were used.Feature importance analysis: Model interpretability techniques (e.g., SHAP values or feature importance rankings) were employed to identify key predictors.Ethicla consideration
Ethical Considerations and Data AccessThe original NDHS data were collected by the Demographic and Health Surveys (DHS) Program with ethical approval from relevant national and international institutional review boards. All participants provided informed consent prior to data collection. This secondary analysis utilized de-identified publicly available data, and no additional ethical approval was required. Access to the original dataset can be obtained through the DHS Program upon reasonable request:
https://dhsprogram.com/
Reproducibility and UseThis dataset has been prepared to facilitate transparent and reproducible research. All variables included in the analysis are provided with consistent naming and formatting to enable replication of the machine learning workflow described in the manuscript. Researchers can directly apply similar preprocessing steps and modeling techniques using standard statistical or machine learning software ( Python).
创建时间:
2026-03-18



