five

Hierarchical Normalized Completely Random Measures to Cluster Grouped Data

收藏
DataCite Commons2021-09-29 更新2024-07-27 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Hierarchical_Normalized_Completely_Random_Measures_to_Cluster_Grouped_Data/7862645
下载链接
链接失效反馈
官方服务:
资源简介:
In this article, we propose a Bayesian nonparametric model for clustering grouped data. We adopt a hierarchical approach: at the highest level, each group of data is modeled according to a mixture, where the mixing distributions are conditionally independent normalized completely random measures (NormCRMs) centered on the same base measure, which is itself a NormCRM. The discreteness of the shared base measure implies that the processes at the data level share the same atoms. This desired feature allows to cluster together observations of different groups. We obtain a representation of the hierarchical clustering model by marginalizing with respect to the infinite dimensional NormCRMs. We investigate the properties of the clustering structure induced by the proposed model and provide theoretical results concerning the distribution of the number of clusters, within and between groups. Furthermore, we offer an interpretation in terms of generalized Chinese restaurant franchise process, which allows for posterior inference under both conjugate and nonconjugate models. We develop algorithms for fully Bayesian inference and assess performances by means of a simulation study and a real-data illustration. Supplementary materials for this article are available online.

本文提出一种用于分组数据聚类的贝叶斯非参数模型(Bayesian nonparametric model)。我们采用分层建模思路:在最高层级,每组数据均以混合模型进行建模,其中混合分布以同一基测度为中心,且为条件独立的归一化完全随机测度(normalized completely random measures, NormCRMs),而该基测度本身亦是一个归一化完全随机测度。共享基测度的离散性意味着数据层级的过程共享相同的测度原子。这一理想特性使得我们能够将不同分组的观测值聚为同一类别。我们通过对无限维归一化完全随机测度进行边缘化操作,得到该分层聚类模型的一种表示形式。我们对所提模型诱导出的聚类结构的性质展开研究,并给出了关于组内与组间聚类数目分布的理论结果。此外,我们从广义中国餐馆特许经营过程(generalized Chinese restaurant franchise process)的角度给出解释,该过程支持共轭与非共轭模型下的后验推断。我们开发了用于全贝叶斯推断的算法,并通过模拟研究与真实数据示例对模型性能进行评估。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2019-03-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作