Robust Covariance Estimation and Explainable Outlier Detection for Matrix-Valued Data
收藏DataCite Commons2025-05-12 更新2025-09-08 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Robust_covariance_estimation_and_explainable_outlier_detection_for_matrix-valued_data/28582137/2
下载链接
链接失效反馈官方服务:
资源简介:
This work introduces the Matrix Minimum Covariance Determinant (MMCD) method, a novel robust location and covariance estimation procedure designed for data that are naturally represented in the form of a matrix. Unlike standard robust multivariate estimators, which would only be applicable after a vectorization of the matrix-variate samples leading to high-dimensional datasets, the MMCD estimators account for the matrix-variate data structure and consistently estimate the mean matrix, as well as the rowwise and columnwise covariance matrices in the class of matrix-variate elliptical distributions. Additionally, we show that the MMCD estimators are matrix affine equivariant and achieve a higher breakdown point than the maximal achievable one by any multivariate, affine equivariant location/covariance estimator when applied to the vectorized data. An efficient algorithm with convergence guarantees is proposed and implemented. As a result, robust Mahalanobis distances based on MMCD estimators offer a reliable tool for outlier detection. Additionally, we extend the concept of Shapley values for outlier explanation to the matrix-variate setting, enabling the decomposition of the squared Mahalanobis distances into contributions of the rows, columns, or individual cells of matrix-valued observations. Notably, both the theoretical guarantees and simulations show that the MMCD estimators outperform robust estimators based on vectorized observations, offering better computational efficiency and improved robustness. Moreover, real-world data examples demonstrate the practical relevance of the MMCD estimators and the resulting robust Shapley values.
本研究提出矩阵最小协方差行列式(Matrix Minimum Covariance Determinant, MMCD)方法,这是一种新颖的鲁棒位置与协方差估计流程,专为天然以矩阵形式表示的数据设计。与标准鲁棒多元估计器不同——后者仅在矩阵变量样本向量化后(这会导致高维数据集)才可应用——MMCD估计器考虑矩阵变量数据结构,并在矩阵变量椭圆分布类中一致估计均值矩阵,以及行向和列向协方差矩阵。此外,我们证明MMCD估计器具有矩阵仿射等变性(matrix affine equivariant),且其崩溃点(breakdown point)高于任何多元仿射等变位置/协方差估计器应用于向量化数据时所能达到的最大崩溃点。本文提出并实现了一种具有收敛保证的高效算法。因此,基于MMCD估计器的鲁棒马氏距离(Mahalanobis distances)为异常值检测提供了可靠工具。此外,我们将用于异常值解释的沙普利值(Shapley values)概念扩展到矩阵变量场景,能够将平方马氏距离分解为矩阵值观测的行、列或单个单元格的贡献。值得注意的是,理论保证与仿真结果均表明,MMCD估计器优于基于向量化观测的鲁棒估计器,具有更高的计算效率和更强的鲁棒性。此外,真实世界数据实例证明了MMCD估计器及所得鲁棒沙普利值的实际相关性。
提供机构:
Taylor & Francis
创建时间:
2025-05-12



