Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness
收藏DataCite Commons2024-03-29 更新2024-08-19 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Mixed_Matrix_Completion_in_Complex_Survey_Sampling_under_Heterogeneous_Missingness_/25222259/2
下载链接
链接失效反馈官方服务:
资源简介:
Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data. Supplementary materials for this article are available online.
大样本量且问卷类型日趋多元的现代调查研究,亟需稳健且可扩展的分析方法。本研究旨在恢复经复杂抽样调查得到的混合类型数据框矩阵,该矩阵的元素服从不同的典型指数分布,且存在异质性缺失。为解决这一挑战性问题,我们提出了两阶段处理框架:第一阶段利用逻辑回归对逐元素缺失机制进行建模;第二阶段则通过最大化带低秩约束的加权对数似然函数,完成目标参数矩阵的补全。我们提出了一种快速且可扩展的估计算法,该算法可实现次线性收敛;同时严格推导了所提方法的估计误差上界。实验结果验证了我们的理论论断,且相较于现有其他方法,所提估计器展现出更优异的性能。我们将所提方法应用于美国国家健康与营养检查调查(National Health and Nutrition Examination Survey)数据的分析。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2024-03-29



