U-Statistical Inference for Hierarchical Clustering
收藏Figshare2020-08-21 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/U-statistical_inference_for_hierarchical_clustering/12844523
下载链接
链接失效反馈官方服务:
资源简介:
Clustering methods are valuable tools for the identification of patterns in high-dimensional data with applications in many scientific fields. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with high-dimension low sample size (HDLSS) data. We develop a U-statistics based clustering approach that assesses statistical significance in clustering and is specifically tailored to HDLSS scenarios. These nonparametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of datasets for which the Euclidean distance captures relevant features. Our main result is the development of a hierarchical significance clustering method. To do so, we first introduce an extension of a relevant U-statistic and develop its asymptotic theory. Additionally, as a preliminary step, we propose a binary nonnested significance clustering method and show its optimality in terms of expected values. Our approach is tested through multiple simulations and found to have more statistical power than competing alternatives in all scenarios considered. Our methods are further showcased in three applications ranging from genetics to image recognition problems. Code for these methods is available in R-package uclust. Supplementary materials for this article are available online.
创建时间:
2020-08-21



