five

Simulated datasets description.

收藏
Figshare2025-11-07 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Simulated_datasets_description_/30567176
下载链接
链接失效反馈
官方服务:
资源简介:
Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occur at model development and at prediction time. The missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a convergence criterion, unified for continuous and categorical variables, is met. The imputation models are saved for each variable and iteration and can be applied later to new observations at prediction time. The missForestPredict package offers extended error monitoring, control over variables used in the imputation and custom initialization. This allows users to tailor the imputation to their specific needs. The missForestPredict algorithm is compared to mean/mode imputation, linear regression imputation, mice, k-nearest neighbours, bagging, miceRanger and IterativeImputer on eight simulated datasets with simulated missingness (48 scenarios) and eight large public datasets using different prediction models. missForestPredict provides competitive results in prediction settings within short computation times.

预测模型(prediction model)是一类基于输入变量预测目标输出结果的模型。在模型开发与预测阶段,输入变量常存在缺失值。missForestPredict R包对missForest插补算法(missForest imputation algorithm)进行了适配优化,该算法运算速度快、易用性强,且专为预测场景定制。算法通过随机森林(random forest)对变量执行迭代插补,直至满足针对连续型与分类型变量统一设定的收敛准则。每一轮迭代、每一个变量对应的插补模型均会被保存,后续可在预测阶段应用于新的观测样本。missForestPredict包还提供了扩展的误差监控功能、插补过程所用变量的调控权限,以及自定义初始化选项,可帮助用户根据自身特定需求定制插补流程。本研究将missForestPredict算法与均值/众数插补、线性回归插补、mice、k近邻(k-nearest neighbours)、装袋(bagging)、miceRanger以及迭代插补器(IterativeImputer)等方法进行对比,对比数据集涵盖8个带有模拟缺失值的模拟数据集(共48种缺失场景),以及8个搭载不同预测模型的大型公开数据集。在较短的计算耗时下,missForestPredict在预测场景中可取得具备竞争力的实验结果。
创建时间:
2025-11-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作