Dataset for explainable AI-based spatiotemporal risk factor analysis in public health: A case study of South Korea
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13944581
下载链接
链接失效反馈官方服务:
资源简介:
This data is for the purpose of explainable arficial intelligence (XAI)-based risk factor analysis for cardiovascular age-standardized mortality rate (CVD-ASMR) in South Korea from 2010 to 2019.
The input data consist of various environmental and socio-demographic factors (a total of 17 variables). Environmental factors include annual mean air pollutants (e.g., PM10, O3, CO, SO2, NO2), temperature, amout of precipitation, heatwave and coldwave days, and mean NDVI. Socio-demographic factors include distance from urban areas, sick beds per 1,000 people, doctors per 1,000 people, smoking rate, percentage of population aged 65 and above, elderly living alone rate, high school graduate or less rate, and number of disabled persons to represent the sociodemographic vulnerability.
The target variable, CVD-ASMR, is reported as I00–I99 of the International Classification of Diseases as the cause of death on the death certificate. However, due to project funding privacy, we did not upload the target data. If you want to use the target data, please contact Prof. Jungho Im (ersgis@unist.ac.kr) and Eunjin Kang (jek0420@unist.ac.kr).
We uploaded two excel files for the proposed schemes (SGG_input_vars_CVD_P.csv) and the comparison scheme (SGG_input_vars_CVD_C.csv). The key difference between two files is whether environmental variables have been rescaled.
Manuscript is under review, and after publication, the research title will be posted.
Previous XAI-based analyses struggled to capture the regional effects of environmental variables, which made it difficult to identify key spatio-temporal risk factors. In our XAI-based risk factor analysis, we proposed two assumptions.
Regionally rescaled environmental variables need to consider unequal effects on environmental factors, likely due to socio-demographic disparities, adaptation capacity to weather conditions, and unequal exposure-response to air pollutants.
District-level disease distribution highlights geographic disparity in socio-demographic vulnerability, whereas temporal variation in diseases by district underscores temporal environmental impacts.
Based on these two hypotheses, we rescaled the environmental variables and proposed the two complementary schemes (P1, P2). To normalize the environmental factors, we applied min-max normalization, which scales the data to a range between 0 and 1. Additionally, to emphasize the temporal residual in relation to regional average mortality, we calculated the district-specific residual based on the 10-year average.
To evaluate the effectiveness of our proposed complementary schemes, we compared their model performance and SHAP analysis results with a previously established purely data-driven XAI strategy (comparison scheme; C). This strategy utilized raw input data (risk factors and disease target) without any rescaling.
Proposed scheme (P1): Examine the association between district-level disease distribution and diverse risk factors to highlight those related to spatial disease variation.
Proposed scheme (P2): Investigate the association between temporal disease variations by district and various factors to highlight the impact of risk factors on temporal disease variations.
Table 1. Summary of target and input variables, used in a case study of spatio-temporal CVD-ASMR risk factor analysis from 2010–2019 in South Korea.
Name
Abbreviations
Units
Data sources
A target variable
Cardiovascular disease
Standardized mortality rate
CVD-ASMR
%
National health insurance service
Input variables
Annual mean PM10
PM10
㎍/m3
Ministry of environment
Annual mean O3
O3
ppm
Annual mean NO2
NO2
ppm
Annual mean CO
CO
ppm
Annual mean SO2
SO2
ppm
Annual mean temperature
Temp
°C
Korea meteorological administration
Annual amount of precipitation
precipitation
mm
Number of heatwave days
heatwave
day
Number of coldwave days
coldwave
day
Annual mean NDVI
NDVI
-
NASA, MODIS
Distance from urban areas
distance_urban
Decimal degree
Sick bed
sickbed
Beds per
1,000 people
Statistics Korea
Doctor
doctor
Doctors per 1,000 people
Smoking rate
smoke
%
National health insurance service
Percentage of population aged 65 and above
age65
%
Ministry of the Interior and Safety
Elderly living alone rate
solitary
%
Statistics Korea
High school graduate or less rate
under_hs
%
Number of disabled persons
disable
Number of persons with
disabilities per
100,000 people
Table 2. Summary of whether input and target variables were rescaled in the proposed and comparison schemes.
Name
P1
P2
C
CVD-SMR
x
o
x
PM10
o
o
x
O3
o
o
x
NO2
o
o
x
CO
o
o
x
SO2
o
o
x
Temp
o
o
x
precipitation
o
o
x
heatwave
o
o
x
coldwave
o
o
x
NDVI
o
o
x
distance_urban
x
x
x
sickbed
x
x
x
doctor
x
x
x
smoke
x
x
x
age65
x
x
x
solitary
x
x
x
under_hs
x
x
x
disable
x
x
x
创建时间:
2024-10-17



