A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach
收藏DataCite Commons2024-02-15 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/A_Normality_Test_for_High-dimensional_Data_based_on_a_Nearest_Neighbor_Approach/14963845/2
下载链接
链接失效反馈官方服务:
资源简介:
Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension grows with the sample size and at the same time exhibit a superior power performance of the new test compared with alternative methods. We also illustrate our approach through two popularly used datasets in high-dimensional classification and clustering literatures where deviation from the normality assumption may lead to invalid conclusions.
诸多针对高维数据的统计方法均假设总体服从正态分布。尽管已有若干多元正态性检验方法被提出,但据我们所知,当数据维度高于观测样本量时,现有方法均无法有效控制第一类错误(Type I error)。本研究提出了一种利用近邻信息的新型非参数检验方法,该方法可在高维场景下保证渐近意义上的第一类错误控制。模拟实验结果验证了所提检验在维度随样本量同步增长时的经验检验水准表现,同时表明相较于其他替代方法,新检验的检验效能更优。此外,我们借助高维分类与聚类文献中两类常用数据集对所提方法进行了演示,此类场景下若违背正态分布假设,将可能得出无效结论。
提供机构:
Taylor & Francis
创建时间:
2021-08-31



