five

lightbgm_shapley code

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/bxmdt72wp2
下载链接
链接失效反馈
官方服务:
资源简介:
In this study, we employed a LightGBM-based machine learning framework to analyze the response of soil erosion to extreme precipitation. The workflow consisted of several key steps: 1. Data preparation and splitting: The dataset was read from a CSV file, with the last column defined as the target variable and the remaining columns as predictors. Data were randomly split into training and testing sets (80%–20%) using a fixed random seed to ensure reproducibility. 2. Hyperparameter optimization: To improve model performance, the hyperparameters of the LightGBM model (including num_leaves, learning_rate, feature_fraction, min_child_weight, subsample, and colsample_bytree) were optimized using Hyperopt with a tree-structured Parzen estimator (TPE) algorithm. Five-fold cross-validation was applied on the training set, and the mean RMSE was used as the objective for optimization. A total of 1000 evaluations were performed to identify the best combination of hyperparameters. 3. Model training and evaluation: Using the optimized parameters, the LightGBM model was trained under five-fold cross-validation on the entire dataset to assess predictive performance. The model was evaluated using RMSE, MAE, and R² metrics. Finally, the model was retrained on the full dataset to obtain a final predictive model, and its accuracy was verified on the held-out test set. 4. SHAP-based interpretation: To interpret the contribution of each predictor to soil erosion, we employed the SHAP (SHapley Additive exPlanations) framework. Both global (summary, bar, and beeswarm plots) and local (dependence, waterfall, and force plots) explanations were generated to reveal the relative importance and nonlinear interactions of the driving factors. The mean absolute SHAP values of each predictor were calculated and visualized to quantify their overall contributions. 5.Visualization and reproducibility: All SHAP-based plots and the feature importance table were saved for further analysis. This framework allows flexible adaptation to datasets of different spatial and temporal scales, ensuring robustness and reproducibility.
创建时间:
2026-01-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作