Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models

Name: Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models
Creator: Taylor & Francis
Published: 2024-02-21 12:40:34
License: 暂无描述

DataCite Commons2024-02-21 更新2024-07-28 收录

下载链接：

https://tandf.figshare.com/articles/dataset/Block-wise_Variable_Selection_for_Clustering_via_Latent_States_of_Mixture_Models/16653127

下载链接

链接失效反馈

官方服务：

资源简介：

Mixture modeling is a major paradigm for clustering in statistics. In this article, we develop a new block-wise variable selection method for clustering by exploiting the latent states of the hidden Markov model on variable blocks or the Gaussian mixture model. The variable blocks are formed by depth-first-search on a dendrogram created based on the mutual information between any pair of variables. It is demonstrated that the latent states of the variable blocks together with the mixture model parameters can represent the original data effectively and much more compactly. We thus cluster the data using the latent states and select variables according to the relationship between the states and the clusters. As true class labels are unknown in the unsupervised setting, we first generate more refined clusters, namely, semi-clusters, for variable selection and then determine the final clusters based on the dimension reduced data. Experiments on simulated and real data show that the new method is highly competitive in terms of clustering accuracy compared with several widely used methods. Supplementary materials for this article are available online.

混合模型是统计学中聚类分析的主流范式。本文针对聚类任务提出一种全新的分块变量选择方法，该方法利用变量块上的隐马尔可夫模型（Hidden Markov Model）或高斯混合模型（Gaussian Mixture Model）的隐状态完成特征选择。变量块通过对基于变量两两互信息构建的树状图执行深度优先搜索生成。研究表明，变量块的隐状态与混合模型参数相结合，能够更高效且更紧凑地表征原始数据。因此，我们借助隐状态对数据进行聚类，并依据隐状态与聚类结果间的关联完成变量选择。由于无监督场景下真实类别标签未知，我们首先生成更精细的聚类结果（即半聚类）以开展变量选择，随后基于降维后的数据确定最终聚类。在模拟数据与真实数据集上的实验结果显示，相较于多种广泛使用的方法，所提方法在聚类精度方面极具竞争力。本文的补充材料可在线获取。

提供机构：

Taylor & Francis

创建时间：

2021-09-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集