five

Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

收藏
Figshare2024-12-16 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Integration_of_Neural_Embeddings_and_Probabilistic_Models_in_Topic_Modeling/28034713
下载链接
链接失效反馈
官方服务:
资源简介:
Topic modeling, a way to find topics in large volumes of text, has grown with the help of deep learning. This paper presents two novel approaches to topic modeling by integrating embeddings derived from Bert-Topic with the multi-grain clustering topic model (MGCTM). Recognizing the inherent hierarchical and multi-scale nature of topics in corpora, our methods utilize MGCTM to capture topic structures at multiple levels of granularity. We enhance the expressiveness of MGCTM by introducing the Generalized Dirichlet and Beta-Liouville distributions as priors, which provide greater flexibility in modeling topic proportions and capturing richer topic relationships. Comprehensive experiments on various datasets showcase the effectiveness of our proposed models in achieving superior topic coherence and granularity compared to state-of-the-art methods. Our findings underscore the potential of leveraging hybrid architectures, marrying neural embeddings with advanced probabilistic modeling, to push the boundaries of topic modeling.

主题建模(Topic modeling)作为从海量文本中挖掘主题的核心技术手段,依托深度学习技术实现了长足发展。本文提出了两种新颖的主题建模方法,将源自BERTopic的嵌入表征与多粒度聚类主题模型(multi-grain clustering topic model, MGCTM)相结合。考虑到语料库中主题固有的层级性与多尺度特性,我们所提出的方法借助MGCTM来捕捉不同粒度层级下的主题结构。我们通过引入广义狄利克雷分布(Generalized Dirichlet)与贝塔-刘维尔分布(Beta-Liouville)作为先验分布,提升了MGCTM的表征能力,这两类分布能够在主题占比建模中提供更强的灵活性,并捕捉到更为丰富的主题关联关系。在多个数据集上开展的全面实验表明,相较于现有最优方法,本文所提出的模型能够实现更优异的主题一致性与粒度表征效果,验证了其有效性。本研究结果进一步证实,融合神经嵌入表征与先进概率建模的混合架构,有望突破主题建模的现有边界,展现出广阔的应用潜力。
创建时间:
2024-12-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作