Code for Training and Optimizing XGBoost
收藏Mendeley Data2023-02-14 更新2024-06-27 收录
下载链接:
https://www.doi.org/10.57760/sciencedb.06270
下载链接
链接失效反馈官方服务:
资源简介:
This code data optimizes the performance of XGBoost by maximizing the generalization ability of the model.Outlines:The hyperparameters n_estimators, eta, max_depth, gamma, subsample and colsample_bytree in XGBoost which can affect its generalization ability are tuned. Cross-validation is used to segment the training data of the model to obtain multiple training subsets and validation sets. By calculating the average loss of the model on different training subsets and validation sets during the process of tuning the relevant hyperparameters mentioned above, the real-time loss of the model on training data and test data is simulated respectively, so as to obtain the global change of the generalization ability of XGBoost. In order to make this change more intuitive, we visualize the real-time average loss of the model on the training subsets and validation sets during the process of tuning the relevant hyperparameters, to draw the corresponding average loss curve. Characterize the generalization ability of the model with the convergence between the two average loss curves. Thus, converging the two average loss curves of the model continuously by tuning the relevant hyperparameters, until the curves can't get close to each other, to maximize the generalization ability of XGBoost. Among the hyperparameters need to be tuned, n_estimators is first determined, which can reveal the baseline generalization ability of XGBoost. The number of trees in the model when its average loss curve on the validation sets tends to be stable will be the value of n_estimators. On this basis, we tune the remaining hyperparameters within their respective ranges one by one, to constantly converge the two average loss curves of XGBoost, and finally optimize the performance of the model.We use the data of depression and anxiety in the elderly to test the performance of XGBoost optimized. And then, with the performance of the optimized model, we select features from the data. We improve the performance of XGBoost after feature selection with the same optimization method as the above.The files contained in the submitted code data are as follows:---xgboostfb.R: shows the process of the optimization of XGBoost;---nlsaaXGB. R: shows the performance and feature selection results of optimized XGBoost on the data of depression and anxiety in the elderly;---ftmXGB.R: shows the performance of XGBoost without optimization after feature selection;---ftmXGBfb.R: shows the performance of XGBoost with optimization after feature selection,and the corresponding process of optimization.Note: The data of depression and anxiety in the elderly (DOI:10.57760/sciencedb.06263) is derived from The Nottingham Longitudinal Study of Activity and Ageing (NLSAA), Morgan, K (1998) The Nottingham Longitudinal Study of Activity and Ageing, Age and Ageing, 27(S3), pp.5-11, ISSN: 0002-0729. To access the original NLSAA data, please contact Professor Kevin Morgan, one of the project leaders, to get the rights. (His homepage: https://www.lboro.ac.uk/departments/ssehs/staff/kevin-morgan/ , email address: kevinmorgansleep@gmail.com).
创建时间:
2023-02-14



