Probabilistic Community Detection With Unknown Number of Communities
收藏DataCite Commons2024-08-07 更新2024-07-27 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Probabilistic_community_detection_with_unknown_number_of_communities/6075764/2
下载链接
链接失效反馈官方服务:
资源简介:
A fundamental problem in network analysis is clustering the nodes into groups which share a similar connectivity pattern. Existing algorithms for community detection assume the knowledge of the number of clusters or estimate it a priori using various selection criteria and subsequently estimate the community structure. Ignoring the uncertainty in the first stage may lead to erroneous clustering, particularly when the community structure is vague. We instead propose a coherent probabilistic framework for simultaneous estimation of the number of communities and the community structure, adapting recently developed Bayesian nonparametric techniques to network models. An efficient Markov chain Monte Carlo (MCMC) algorithm is proposed which obviates the need to perform reversible jump MCMC on the number of clusters. The methodology is shown to outperform recently developed community detection algorithms in a variety of synthetic data examples and in benchmark real-datasets. Using an appropriate metric on the space of all configurations, we develop nonasymptotic Bayes risk bounds even when the number of clusters is unknown. Enroute, we develop concentration properties of nonlinear functions of Bernoulli random variables, which may be of independent interest in analysis of related models. Supplementary materials for this article are available online.
网络分析中的一个基础问题,是将节点聚类为具有相似连接模式的群组。现有的社区检测(community detection)算法要么预设已知簇数,要么先通过各类选择准则预先估计簇数,随后再对社区结构进行估计。若忽略第一阶段的不确定性,可能会导致错误的聚类结果,尤其是在社区结构较为模糊的场景下。本文提出了一套连贯的概率框架,用于同时估计社区数量与社区结构,将新近发展的贝叶斯非参数技术(Bayesian nonparametric techniques)适配至网络模型当中。同时,本文设计了一种高效的马尔可夫链蒙特卡洛(Markov chain Monte Carlo, MCMC)算法,无需针对簇数执行可逆跳变马尔可夫链蒙特卡洛(reversible jump MCMC)操作。在各类合成数据集示例与基准真实数据集上的实验表明,所提方法的性能优于近年提出的社区检测算法。借助所有配置空间上的合适度量,本文即使在簇数未知的情况下,也推导得到了非渐近贝叶斯风险界。在此研究过程中,本文还推导了伯努利随机变量(Bernoulli random variables)非线性函数的集中性质,该成果在相关模型的分析中或具备独立研究价值。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2018-07-11



