Finite-dimensional Discrete Random Structures and Bayesian Clustering
收藏DataCite Commons2024-06-26 更新2024-07-29 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Finite-dimensional_Discrete_Random_Structures_and_Bayesian_Clustering/21583377/1
下载链接
链接失效反馈官方服务:
资源简介:
Discrete random probability measures stand out as effective tools for Bayesian clustering. The investigation in the area has been very lively, with a strong emphasis on nonparametric procedures based on either the Dirichlet process or on more flexible generalizations, such as the normalized random measures with independent increments (NRMI). The literature on finite-dimensional discrete priors is much more limited and mostly confined to the standard Dirichlet-multinomial model. While such a specification may be attractive due to conjugacy, it suffers from considerable limitations when it comes to addressing clustering problems. In order to overcome these, we introduce a novel class of priors that arise as the hierarchical compositions of finite-dimensional random discrete structures. Despite the analytical hurdles such a construction entails, we are able to characterize the induced random partition and determine explicit expressions of the associated urn scheme and of the posterior distribution. A detailed comparison with (infinite-dimensional) NRMIs is also provided: indeed, informative bounds for the discrepancy between the partition laws are obtained. Finally, the performance of our proposal over existing methods is assessed on a real application where we study a publicly available dataset from the Italian education system comprising the scores of a mandatory nationwide test.
离散随机概率测度(Discrete random probability measures)是贝叶斯聚类(Bayesian clustering)的有效工具。该领域的研究一直十分活跃,重点关注基于狄利克雷过程(Dirichlet Process)或更灵活的推广形式(如带独立增量的归一化随机测度(normalized random measures with independent increments, NRMI))的非参数方法。针对有限维离散先验的研究文献则相对匮乏,且大多局限于标准狄利克雷-多项分布模型(Dirichlet-Multinomial Model)。尽管这类先验设定因共轭性而颇具吸引力,但在解决聚类问题时存在诸多显著局限。为克服上述缺陷,我们提出了一类新颖的先验分布,其构造源于有限维随机离散结构的分层复合。尽管该构造会带来分析层面的诸多难点,但我们仍得以刻画其诱导的随机划分,并推导得到了对应瓮模型(urn scheme)与后验分布的显式表达式。此外,我们还与(无限维)NRMIs展开了详尽的对比分析,成功得到了两类划分法则间差异度的有效界值。最后,我们通过一项真实应用场景评估了所提方法相较于现有方法的性能:该实验采用了意大利教育系统公开的全国强制统一考试分数数据集。
提供机构:
Taylor & Francis
创建时间:
2022-11-18



