Sparse Biclustering of Transposable Data
收藏DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Sparse_Biclustering_of_Transposable_Data/1209699/1
下载链接
链接失效反馈官方服务:
资源简介:
We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log-likelihood. We apply an ℓ<sub>1</sub> penalty to the means of the biclusters to obtain sparse and interpretable biclusters. Our proposal amounts to a sparse, symmetrized version of <i>k</i>-means clustering. We show that <i>k</i>-means clustering of the rows and of the columns of a data matrix can be seen as special cases of our proposal, and that a relaxation of our proposal yields the singular value decomposition. In addition, we propose a framework for biclustering based on the matrix-variate normal distribution. The performances of our proposals are demonstrated in a simulation study and on a gene expression dataset. This article has supplementary material online.
本文针对大型可转置数据矩阵(transposable data matrix)的行与列同步聚类任务展开研究。本文假设矩阵元素服从正态分布,且带有双聚类(biclustering)专属均值项与公共方差,并通过最大化对应对数似然完成双聚类。我们对双聚类的均值施加ℓ₁范数惩罚(ℓ₁ penalty),以获得稀疏且可解释的双聚类结果。本文所提方法等价于k-means聚类的稀疏对称化版本。我们证明,对数据矩阵的行与列分别进行k-means聚类可视为本文所提方法的特例,且对本文方法进行松弛可得到奇异值分解(singular value decomposition)。此外,我们提出了基于矩阵正态分布(matrix-variate normal distribution)的双聚类框架。我们通过模拟研究与基因表达数据集验证了所提方法的性能表现。本文附带在线补充材料。
提供机构:
Taylor & Francis
创建时间:
2016-01-19



