five

Dataset for explainable AI-based spatiotemporal risk factor analysis in public health: A case study of South Korea

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13944581
下载链接
链接失效反馈
官方服务:
资源简介:
This data is for the purpose of explainable arficial intelligence (XAI)-based risk factor analysis for cardiovascular age-standardized mortality rate (CVD-ASMR) in South Korea from 2010 to 2019. The input data consist of various environmental and socio-demographic factors (a total of 17 variables). Environmental factors include annual mean air pollutants (e.g., PM10, O3, CO, SO2, NO2), temperature, amout of precipitation, heatwave and coldwave days, and mean NDVI. Socio-demographic factors include distance from urban areas, sick beds per 1,000 people, doctors per 1,000 people, smoking rate, percentage of population aged 65 and above, elderly living alone rate, high school graduate or less rate, and number of disabled persons to represent the sociodemographic vulnerability. The target variable, CVD-ASMR, is reported as I00–I99 of the International Classification of Diseases as the cause of death on the death certificate. However, due to  project funding privacy, we did not upload the target data. If you want to use the target data, please contact Prof. Jungho Im (ersgis@unist.ac.kr) and Eunjin Kang (jek0420@unist.ac.kr). We uploaded two excel files for the proposed schemes (SGG_input_vars_CVD_P.csv) and the comparison scheme (SGG_input_vars_CVD_C.csv). The key difference between two files is whether environmental variables have been rescaled. Manuscript is under review, and after publication, the research title will be posted.     Previous XAI-based analyses struggled to capture the regional effects of environmental variables, which made it difficult to identify key spatio-temporal risk factors. In our XAI-based risk factor analysis, we proposed two assumptions.  Regionally rescaled environmental variables need to consider unequal effects on environmental factors, likely due to socio-demographic disparities, adaptation capacity to weather conditions, and unequal exposure-response to air pollutants.  District-level disease distribution highlights geographic disparity in socio-demographic vulnerability, whereas temporal variation in diseases by district underscores temporal environmental impacts.   Based on these two hypotheses, we rescaled the environmental variables and proposed the two complementary schemes (P1, P2). To normalize the environmental factors, we applied min-max normalization, which scales the data to a range between 0 and 1. Additionally, to emphasize the temporal residual in relation to regional average mortality, we calculated the district-specific residual based on the 10-year average. To evaluate the effectiveness of our proposed complementary schemes, we compared their model performance and SHAP analysis results with a previously established purely data-driven XAI strategy (comparison scheme; C). This strategy utilized raw input data (risk factors and disease target) without any rescaling. Proposed scheme (P1): Examine the association between district-level disease distribution and diverse risk factors to highlight those related to spatial disease variation. Proposed scheme (P2): Investigate the association between temporal disease variations by district and various factors to highlight the impact of risk factors on temporal disease variations.   Table 1. Summary of target and input variables, used in a case study of spatio-temporal CVD-ASMR risk factor analysis from 2010–2019 in South Korea. Name Abbreviations Units Data sources A target variable Cardiovascular disease Standardized mortality rate CVD-ASMR % National health insurance service Input variables Annual mean PM10 PM10 ㎍/m3 Ministry of environment Annual mean O3 O3 ppm Annual mean NO2 NO2 ppm Annual mean CO CO ppm Annual mean SO2 SO2 ppm Annual mean temperature Temp °C Korea meteorological administration Annual amount of precipitation precipitation mm Number of heatwave days heatwave day Number of coldwave days coldwave day Annual mean NDVI NDVI - NASA, MODIS Distance from urban areas distance_urban Decimal degree Sick bed sickbed Beds per 1,000 people Statistics Korea Doctor doctor Doctors per 1,000 people Smoking rate smoke % National health insurance service Percentage of population aged 65 and above age65 % Ministry of the Interior and Safety Elderly living alone rate solitary % Statistics Korea High school graduate or less rate under_hs % Number of disabled persons disable Number of persons with disabilities per 100,000 people   Table 2. Summary of whether input and target variables were rescaled in the proposed and comparison schemes. Name P1 P2 C CVD-SMR x o x PM10 o o x O3 o o x NO2 o o x CO o o x SO2 o o x Temp o o x precipitation o o x heatwave o o x coldwave o o x NDVI o o x distance_urban x x x sickbed x x x doctor x x x smoke x x x age65 x x x solitary x x x under_hs x x x disable x x x
创建时间:
2024-10-17
二维码
社区交流群
二维码
科研交流群
商业服务