Salvaging Data Records with Missing Data, Part 2: Incorporating Imputation Bounds and the Power of Stan Hamiltonian Monte Carlo
收藏DataCite Commons2023-10-17 更新2025-04-16 收录
下载链接:
http://dataverse.jpl.nasa.gov/citation?persistentId=doi:10.48577/jpl.0UAXXV
下载链接
链接失效反馈官方服务:
资源简介:
When doing multivariate data analysis, one commonobstacle is the presence of incomplete observations, i.e., observationsfor which one or more key fields are missing data. Ratherthan deleting entire observations that contain missing data,which can lead to small sample sizes and biased inferences, dataimputation methods can be used to statistically “fill-in” missingdata. Imputing data can help combat small sample sizes byusing the existing information in partially complete observationswith the end goal of producing less biased and higher confidenceinferences. In addition, the knowledge contained within theincomplete observations is no longer lost when the partial datarecords are used, thus the effort spent collecting that data is notwasted effort.At the IEEE Aerospace Conference in 2021, we presented amethodology for imputing data using the Monotone Data Augmentation(MDA) algorithm, provided that the data followstwo assumptions: that it is missing at random (MAR) and itis approximately multivariate t-distributed. An example waspresented in the context of the NASA Instrument Cost Model(NICM), namely the NICM System Tool, which models the total(or system) cost of an instrument, typically as a function of thatinstrument’s total mass and power.In this paper, we outline the benefits of a fully Bayesian approachwhich samples simultaneously from the joint posteriordistribution of model parameters and the imputed values forthe missing data. This approach is preferred over multipleimputation approaches like MDA because it is more compatiblewith complex model forms and limited-range missing covariates.The new imputation approach is applied in two different waysin the NICM System Tool and NICM Subsystem Tool, as anexample to demonstrate the utility of the methods for usingknown bounds and improving model estimates. The examplemodels are implemented in Stan, a statistical-modeling toolenabling Hamiltonian Monte Carlo (HMC). The
提供机构:
Root
创建时间:
2023-10-15



