Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning
收藏DataONE2022-09-29 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:1fc553bcbb89d8dedbb096c6d804d8a8e590da08184989eac873f97f8f10d047
下载链接
链接失效反馈官方服务:
资源简介:
Replication and simulation reproduction materials for the article \"The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning.\" Please see the README file for a summary of the contents and the Replication Guide for a more detailed description. Article abstract: Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS's accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.
本材料为论文《The MIDAS Touch: 基于深度学习的精准可扩展缺失数据插补》的复现与仿真复刻资料。如需了解内容概览,请查阅README文件;如需获取更详细的说明,请参考《复现指南》(Replication Guide)。论文摘要:当前主流的缺失值分析方法多以多重插补(multiple imputation)为核心,虽愈发普及,但在处理日益常见的大规模复杂数据时往往难以胜任。我们提出一种精准、高效且可扩展的多重插补方法,将其命名为MIDAS(降噪自编码器多重插补,Multiple Imputation with Denoising Autoencoders)。MIDAS采用一类被称为降噪自编码器(denoising autoencoders)的无监督神经网络,这类网络通过对部分数据施加扰动并尝试重构该部分数据来实现降维。我们将降噪自编码器适配于多重插补任务:将缺失值视为额外的扰动数据部分,并通过训练以最小化原始观测数据部分重构误差的模型来生成插补值。我们在仿真数据与真实社会科学数据上开展了系统性测试,并结合一个大型选举调查的应用案例,验证了MIDAS在多种场景下的准确性与高效性。本项目提供了用于实现MIDAS的开源软件。
创建时间:
2023-11-23



