five

Details on imputation methods in R or python.

收藏
Figshare2025-11-07 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Details_on_imputation_methods_in_R_or_python_/30567182
下载链接
链接失效反馈
官方服务:
资源简介:
Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occur at model development and at prediction time. The missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a convergence criterion, unified for continuous and categorical variables, is met. The imputation models are saved for each variable and iteration and can be applied later to new observations at prediction time. The missForestPredict package offers extended error monitoring, control over variables used in the imputation and custom initialization. This allows users to tailor the imputation to their specific needs. The missForestPredict algorithm is compared to mean/mode imputation, linear regression imputation, mice, k-nearest neighbours, bagging, miceRanger and IterativeImputer on eight simulated datasets with simulated missingness (48 scenarios) and eight large public datasets using different prediction models. missForestPredict provides competitive results in prediction settings within short computation times.

预测模型(Prediction models)用于基于输入变量预测目标结果。在模型开发与预测阶段,输入变量常出现缺失数据问题。missForestPredict R软件包提出了专为预测场景定制优化的missForest插补算法(missForest imputation algorithm)适配方案,兼具运算高效、易用性强的特点。该算法通过随机森林(Random Forest)对变量进行迭代插补,直至满足针对连续型变量与分类型变量统一制定的收敛准则。系统会为每个变量及每一轮迭代保存对应的插补模型,后续可在预测阶段将其应用于新的观测样本。missForestPredict软件包还提供了扩展的误差监控机制,支持对插补流程中所用变量进行管控,并支持自定义初始化设置,可帮助用户根据自身具体需求定制插补方案。研究团队在8个带有模拟缺失值的仿真数据集(共48种缺失场景)以及8个搭载不同预测模型的大型公开数据集上,将missForestPredict算法与均值/众数插补、线性回归插补、mice、k近邻(k-nearest neighbours)、装袋法(bagging)、miceRanger以及IterativeImputer进行了对比实验。实验结果表明,missForestPredict在预测场景中可获得具有竞争力的预测性能,且整体运算耗时较短。
创建时间:
2025-11-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作