Replication Data for: Translating Survey-Based Machine Learning into Stroke-Specific Triage: Non-Linear Thresholds and Competing Risk Analysis in a 9-Year Longitudinal Cohort

Name: Replication Data for: Translating Survey-Based Machine Learning into Stroke-Specific Triage: Non-Linear Thresholds and Competing Risk Analysis in a 9-Year Longitudinal Cohort
Creator: Mendeley Data
Published: 2026-03-18 16:44:46
License: 暂无描述

DataCite Commons2026-03-18 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/c95bhdxfbv/1

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset provides the analysis code, derived analytical data, and tabulated results for the study "Translating Survey-Based Machine Learning into Stroke-Specific Triage: Non-Linear Thresholds and Competing Risk Analysis in a 9-Year Longitudinal Cohort." Using data from the China Health and Retirement Longitudinal Study (CHARLS, 2011–2020), we developed an interpretable machine-learning survival framework for long-term cardiovascular disease (CVD) prediction in 9,551 community-dwelling Chinese older adults. A Random Survival Forest (RSF) incorporating 40 survey-based and biochemical variables was optimized via 5-fold cross-validation, with competing mortality addressed through the Aalen-Johansen estimator. Model interpretation employed permutation importance, restricted cubic splines, and an eight-variable Cox surrogate nomogram for bedside risk communication. The dataset contains three components: (1) Code: Eleven Python scripts organized by analytical stage (data preparation, modeling, interpretation, evaluation, clinical tools, and sensitivity analyses), plus three utility scripts for generating figures and tables. Scripts are designed to run sequentially from Step 1 through Step 6. (2) Derived Data: The integrated analysis dataset (n = 9,551; CSV format) containing participant-level outcome indicators, follow-up durations, and 40 baseline predictor variables after data cleaning and cohort selection. Feature column name lists are also included. (3) Results: Ten CSV files containing baseline characteristics, Cox hazard ratios, RSF permutation importance rankings, calibration data, competing risk cumulative incidence estimates, restricted cubic spline non-linearity test results, sensitivity analysis outputs, subgroup-stratified performance metrics, and resilience phenotype feature comparisons. Note: Raw CHARLS microdata cannot be redistributed and must be obtained separately from the CHARLS Data Portal (https://charls.charlsdata.com/) under the CHARLS data use agreement. The derived dataset included here contains only processed analytical variables. Software requirements: Python >= 3.9 with pandas, numpy, scikit-learn, scikit-survival, lifelines, statsmodels, shap, matplotlib, and seaborn.

提供机构：

Mendeley Data

创建时间：

2026-03-18