Scalable Bayesian Nonparametric Clustering and Classification
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Scalable_Bayesian_Nonparametric_Clustering_and_Classification/8243045
下载链接
链接失效反馈官方服务:
资源简介:
We develop a scalable multistep Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is “embarrassingly parallel” and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach make inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating datasets: a large set of electronic health records and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.
我们针对聚类与分类任务下的一大类非参数贝叶斯模型(nonparametric Bayesian models)的推断问题,开发了一种可扩展的多步蒙特卡洛算法。该算法的每一步均具备易并行特性,且可通过同一套马尔可夫链蒙特卡洛(Markov chain Monte Carlo)采样器实现。本方法兼具简洁性与普适性,使得适用于大规模数据集的各类贝叶斯非参数混合模型推断成为可能。具体而言,我们将该方法应用于带协变量回归的乘积划分模型(product partition model)下的推断任务。我们基于两个典型数据集展示了推断结果:其一为大规模电子健康记录(electronic health records)数据集,其二为银行电话营销数据集。相较于其他主流分类器,我们的方法能够发现有价值的聚类结果,并具备颇具竞争力的分类性能。本文的补充材料可在线获取。
创建时间:
2019-06-07



