Mental Health Temporal Trends from 1993 to 2023: Supervised Machine Learning and Poststratification with BRFSS and ACS
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/3CWJJX
下载链接
链接失效反馈官方服务:
资源简介:
IMPORTANCE Temporal trends in mental health have been changing across generations and regions worldwide. Producing reliable and valid estimates of poor mental health prevalence at the population level is essential for designing effective prevention and intervention strategies. BACKGROUND For over three decades, U.S. states have used the Behavioral Risk Factor Surveillance System (BRFSS) to collect survey data on mental health. Respondents are asked: Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? CHALLENGE In 2023, BRFSS data are unavailable for the states of Pennsylvania and Kentucky. This is a significant problem, since national statistics typically rely on combining data from all states. Pennsylvania, in particular, is one of the most populous states, and its absence necessitates novel methods for producing a valid national statistic. SOLUTION To generate a national statistic of “poor mental health” prevalence by demographic characteristics, I applied machine learning prediction combined with poststratification. METHODS I trained supervised machine learning classification models and use predicted probabilities from the best-performing model to create a national statistic. These predicted probabilities are hierarchically smoothed and then combined with poststratification weights to produce a national statistic. Poststratification weights were created using American Community Survey (ACS) Public Use Microdata Sample (PUMS) five-year data from 2019-2023. Please note multiple methods were used in production of estimates. For example, the use of a year weight to capture drift in mental health responses over time. Although not shown, hundreds of hours were used to explore the use of synthetic data and code is available upon request. RESULTS BRFSS temporal trends from 1993 to 2023 are illustrated using ground-truth confidence intervals and spline curves to show overall trends by demographic group. ML+PS national statistic are shown in the graphs. Model performance metrics, detailed estimate table, and trend visualizations are available below. CONCLUSIONS Machine learning combined with hierarchical smoothing and poststratification offers an effective approach for generating a 2023 population estimate of poor mental health prevalence. Researchers should continue to explore the benefits of ML in the production of national statistics with survey data created to produce state-level prevalence estimates.
创建时间:
2025-06-28



