Randomized Spectral Clustering in Large-Scale Stochastic Block Models
收藏DataCite Commons2023-03-29 更新2024-07-29 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Randomized_Spectral_Clustering_in_Large-Scale_Stochastic_Block_Models/19100047
下载链接
链接失效反馈官方服务:
资源简介:
Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block model that is <i>not</i> necessarily of full rank. To do this, we first use the recently developed sketching algorithms to obtain two randomized spectral clustering algorithms, namely, the random projection-based and the random sampling-based spectral clustering. Then we study the theoretical bounds of the resulting algorithms in terms of the approximation error for the population adjacency matrix, the misclassification error, and the estimation error for the link probability matrix. It turns out that, under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm. We also extend the results to degree-corrected stochastic block models. Numerical experiments support our theoretical findings and show the efficiency of randomized methods. A new R package called Rclust is developed and made available to the public. Supplementary materials for this article are available online.
谱聚类(Spectral clustering)是当前网络社区检测领域应用最为广泛的方法之一。然而,大规模网络会对其中的特征值分解(eigenvalue decomposition)带来计算层面的挑战。本文从统计视角出发,研究基于随机草图算法(randomized sketching algorithms)的谱聚类方法,其中我们默认网络数据由一个未必满秩的随机块模型(stochastic block model)生成。为此,我们首先利用近期提出的草图算法,得到两种随机谱聚类算法:基于随机投影的谱聚类与基于随机采样的谱聚类。随后,我们从总体邻接矩阵(population adjacency matrix)的近似误差、误分类误差(misclassification error)以及链接概率矩阵(link probability matrix)的估计误差三个维度,分析了所得算法的理论界。研究结果显示,在温和条件下,随机谱聚类算法可取得与原始谱聚类算法完全一致的理论界。我们还将上述研究结果推广至度校正随机块模型(degree-corrected stochastic block model)。数值实验验证了我们的理论发现,并展现了随机化方法的高效性。本文开发了一款名为Rclust的全新R包(R package)并向公众开放。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2022-01-31



