To Adjust or not to Adjust? Estimating the Average Treatment Effect in Randomized Experiments with Missing Covariates

Figshare2022-09-12 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/To_adjust_or_not_to_adjust_Estimating_the_average_treatment_effect_in_randomized_experiments_with_missing_covariates/21082244

下载链接

链接失效反馈

官方服务：

资源简介：

Randomized experiments allow for consistent estimation of the average treatment effect based on the difference in mean outcomes without strong modeling assumptions. Appropriate use of pretreatment covariates can further improve the estimation efficiency. Missingness in covariates is nevertheless common in practice, and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysis adjusts for all completely observed covariates, and is asymptotically more efficient than the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain of adjusting for covariates subject to missingness? To reconcile the conflicting recommendations in the literature, we analyze and compare five strategies for handling missing covariates in randomized experiments under the design-based framework, and recommend the missingness-indicator method, as a known but not so popular strategy in the literature, due to its multiple advantages. First, it removes the dependence of the regression-adjusted estimators on the imputed values for the missing covariates. Second, it does not require modeling the missingness mechanism, and yields consistent estimators even when the missingness mechanism is related to the missing covariates and unobservable potential outcomes. Third, it ensures large-sample efficiency over the complete-covariate analysis and the analysis based on only the imputed covariates. Lastly, it is easy to implement via least squares. We also propose modifications to it based on asymptotic and finite sample considerations. Importantly, our theory views randomization as the basis for inference, and does not impose any modeling assumptions on the data-generating process or missingness mechanism. Supplementary materials for this article are available online.

随机对照试验（randomized experiments）无需较强的建模假设，即可基于结局均值差实现平均治疗效应（average treatment effect）的一致估计。合理运用预处理协变量（pretreatment covariates）可进一步提升估计效率。然而实践中协变量缺失现象普遍，由此引出一项关键议题：是否应当针对存在缺失的协变量进行校正？若需校正，具体应如何操作？未校正的均值差估计量始终具备无偏性。完全协变量分析（complete-covariate analysis）会对所有已完全观测的协变量实施校正，若存在至少一个可预测结局的完全观测协变量，则该方法的渐近效率优于均值差估计。那么针对存在缺失的协变量进行校正，能够带来哪些额外收益？为调和现有文献中的分歧建议，本文基于设计推断框架（design-based framework），分析并比较了随机对照试验中处理协变量缺失的五种策略，并推荐缺失指示法（missingness-indicator method）——这一方法虽已被提出但在学界尚未得到广泛应用——因其具备多重显著优势：其一，可消除回归校正估计量对协变量缺失值插补结果的依赖；其二，无需对缺失机制进行建模，即便缺失机制与缺失协变量以及不可观测的潜在结局相关，仍可得到一致估计量；其三，相较于完全协变量分析以及仅基于插补协变量的分析方法，该方法可保证大样本效率优势；其四，可通过最小二乘法（least squares）便捷实现。本文还基于渐近特性与有限样本特性对该方法进行了改进。尤为重要的是，本文的理论框架以随机化为推断基础，无需对数据生成过程或缺失机制施加任何建模假设。本文的补充材料可在线获取。

创建时间：

2022-09-12