U-statistical inference for hierarchical clustering
收藏Taylor & Francis Group2020-08-25 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/U-statistical_inference_for_hierarchical_clustering/12844523/1
下载链接
链接失效反馈官方服务:
资源简介:
Clustering methods are valuable tools for the identification of patterns in high dimensional data with applications in many scientific fields. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop a U-statistics based clustering approach that assesses statistical significance in clustering and is specifically tailored to HDLSS scenarios. These non-parametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of dataset for which the Euclidean distance captures relevant features. Our main result is the development of a hierarchical significance clustering method. In order to do so, we first introduce an extension of a relevant U-statistic and develop its asymptotic theory. Additionally, as a preliminary step, we propose a binary non-nested significance clustering method and show its optimality in terms of expected values. Our approach is tested through multiple simulations and found to have more statistical power than competing alternatives in all scenarios considered. They are further showcased in three applications ranging from genetics to image recognition problems. Code for these methods is available in R-package uclust.
创建时间:
2020-08-21



