five

"Hierarchical Contrastive Neural Topic Modeling Based on Optimal Transport: A Scalable Framework for Temporal Semantics"

收藏
DataCite Commons2025-10-16 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/hierarchical-contrastive-neural-topic-modeling-based-optimal-transport-scalable-framework
下载链接
链接失效反馈
官方服务:
资源简介:
"        In today's data-driven society, dynamic topic models are widely used to reveal the evolutionary patterns of latent topics in document collections over time, which is of great significance for knowledge evolution analysis, hotspot discovery, and trend prediction. Existing neural topic models still have deficiencies in modeling document--topic alignment and hierarchical semantic structures, such as high dependence on pre-trained language models, high inference costs, and lack of explicit hierarchical structure modeling. To address these issues, this paper proposes a novel **Hierarchical Contrastive Neural Topic Model based on Optimal Transport** (HCNTM), which introduces Wasserstein distance constraints and hierarchical contrastive learning losses under a Bayesian variational framework to achieve fine-grained alignment of document--topic distributions and multi-granularity semantic decoupling. Specifically, the model first constructs continuous-time topic embedding trajectories through neural ordinary differential equations, then employs optimal transport mapping to eliminate semantic drift between adjacent time slices, and further designs a contrastive learning strategy with positive and negative samples obtained from hierarchical clustering to explicitly enhance the hierarchical relationships of topic--word and topic--topic. Large-scale experiments show that this method significantly outperforms nine representative models including HiCOT and EnCOT on multiple real datasets in terms of topic coherence, evolution smoothness, and prediction accuracy. The research results not only validate the effectiveness of combining optimal transport with contrastive learning, but also provide new theoretical and methodological support for dynamic semantic evolution modeling."

在当今数据驱动的社会中,动态主题模型被广泛用于揭示文档集中隐式主题随时间的演化规律,这对于知识演化分析、热点发现与趋势预测均具有重要意义。现有神经主题模型在文档-主题对齐建模与层级语义结构建模方面仍存在不足,具体表现为高度依赖预训练语言模型、推理开销较高,且缺乏显式的层级结构建模能力。为解决上述问题,本文提出一种全新的**基于最优传输的层级对比神经主题模型(Hierarchical Contrastive Neural Topic Model based on Optimal Transport,简称HCNTM)**,该模型在贝叶斯变分框架下引入瓦瑟斯坦距离约束与层级对比学习损失,以实现文档-主题分布的细粒度对齐与多粒度语义解耦。具体而言,该模型首先通过神经常微分方程构建连续时间主题嵌入轨迹,随后采用最优传输映射消除相邻时间片间的语义漂移,并进一步设计了基于层级聚类获取正负样本的对比学习策略,以显式强化主题-词汇与主题-主题的层级关系。大规模实验结果表明,在多个真实数据集上,该方法在主题一致性、演化平滑度与预测准确率三项指标上,均显著优于包括HiCOT与EnCOT在内的九种代表性模型。本研究结果不仅验证了将最优传输与对比学习相结合的有效性,同时也为动态语义演化建模提供了新的理论与方法支撑。
提供机构:
IEEE DataPort
创建时间:
2025-10-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作