five

Model-Based Clustering of Categorical Data Based on the Hamming Distance

收藏
Taylor & Francis Group2024-11-12 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Model-based_clustering_of_categorical_data_based_on_the_Hamming_distance/27074920/2
下载链接
链接失效反馈
官方服务:
资源简介:
A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with an unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting, and a transdimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure, and the group-specific parameters, facilitating the computation with respect to customary reversible jump algorithms. The proposed model encompasses a parsimonious latent class model as a special case when the number of components is fixed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
提供机构:
Argiento, Raffaele; Paci, Lucia; Filippi-Mazzola, Edoardo
创建时间:
2024-11-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作