Contains the following files: Fig A. Frequency that each variable is selected for the stepwise variable selection method when sample size changes.

NIAID Data Ecosystem2026-03-08 收录

下载链接：

https://figshare.com/articles/dataset/_Improved_Variable_Selection_Algorithm_Using_a_LASSO_Type_Penalty_with_an_Application_to_Assessing_Hepatitis_B_Infection_Relevant_Factors_in_Community_Residents_/1494886

下载链接

链接失效反馈

官方服务：

资源简介：

Plot based on 100 simulations using various sample sizes (n = 100, 200, 300 and 500). Left panel: the number of true predictors r = 8; right panel: the number of true predictors r = 12. Red bars represent the selection frequency of significant variables (true non-zero predictors) and grey bars represent that of noise variables (true zero predictors) in the simulated data. Fig B. Frequency that each variable is selected for the stability selection method when sample size changes. Plot based on 100 simulations using various sample sizes (n = 100, 200, 300 and 500). Left panel: the number of true predictors r = 8; right panel: the number of true predictors r = 12. Red bars represent the selection frequency of significant variables (true non-zero predictors) and grey bars represent that of noise variables (true zero predictors) in the simulated data. Fig C. Frequency that each variable is selected for the LASSO variable selection method when sample size changes. Plot based on 100 simulations using various sample sizes (n = 100, 200, 300 and 500). Left panel: the number of true predictors r = 8; right panel: the number of true predictors r = 12. Red bars represent the selection frequency of significant variables (true non-zero predictors) and grey bars represent that of noise variables (true zero predictors) in the simulated data. Fig D. Frequency that each variable is selected for the Bolasso variable selection method when sample size changes. Plot based on 100 simulations using various sample sizes (n = 100, 200, 300 and 500). Left panel: the number of true predictors r = 8; right panel: the number of true predictors r = 12. Red bars represent the selection frequency of significant variables (true non-zero predictors) and grey bars represent that of noise variables (true zero predictors) in the simulated data. Fig E. Frequency that each variable is selected for the two-stage hybrid variable selection method when sample size changes. Plot based on 100 simulations using various sample sizes (n = 100, 200, 300 and 500). Left panel: the number of true predictors r = 8; right panel: the number of true predictors r = 12. Red bars represent the selection frequency of significant variables (true non-zero predictors) and grey bars represent that of noise variables (true zero predictors) in the simulated data. Fig F. Frequency that each variable is selected for the bootstrap ranking variable selection method when sample size changes. Plot based on 100 simulations using various sample sizes (n = 100, 200, 300 and 500). Left panel: the number of true predictors r = 8; right panel: the number of true predictors r = 12. Red bars represent the selection frequency of significant variables (true non-zero predictors) and grey bars represent that of noise variables (true zero predictors) in the simulated data. Fig G. Sensitivity analysis based on the metric AUC to evaluate the performance of the compared methods when the number of true predictors (r = 8, 12, 16 and 20) and sample size (n = 50, 100, 200, 300 and 500) changes with respect to the small effect predictors of group 1. A total number of variables t = 100 were simulated. Six compared methods: stepwise, stability selection, LASSO, Bolasso, two-stage hybrid and bootstrap ranking procedures. Fig H. Sensitivity analysis based on the metric AUC to evaluate the performance of the compared methods when the number of true predictors (r = 8, 12, 16 and 20) and sample size (n = 50, 100, 200, 300 and 500) changes with respect to the small effect predictors of group 2. A total number of variables t = 100 were simulated. Six compared methods: stepwise, stability selection, LASSO, Bolasso, two-stage hybrid and bootstrap ranking procedures. Fig I. The estimation of tuning parameter λ and coefficients for the LASSO model. (A): The deviance with error bar of the LASSO logistic regression model using a 10-fold cross-validation across different values of the tuning parameter (log-scale). The optimal model is the one with a deviance of 0.2301 when the tuning parameter reaches 0.0017. (B): The path of the estimated coefficients over a grid of values for λ and the selected variables corresponding to the optimal λ. Table A. R codes of the two-stage hybrid procedure. The R function TSLasso was used for establishing the two-stage hybrid procedure. Table B. R codes of the bootstrap ranking procedure. The R function Bootranking was used for establishing the bootstrap ranking procedure. Table C. A de-identified dataset of this work was made publicly-available. (DOCX)

创建时间：

2015-07-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集