Replication data for: Topic-partitioned multinetwork embeddings

NIAID Data Ecosystem2026-03-07 收录

下载链接：

https://doi.org/10.7910/DVN/GGHMFT

下载链接

链接失效反馈

官方服务：

资源简介：

We introduce a joint model of network content and context designed for exploratory analysis of email networks via visualization of topic-specific communication patterns. Our model is an admixture model for text and network attributes which uses multinomial distributions over words as mixture components for explaining text and latent Euclidean positions of actors as mixture components for explaining network attributes. We validate the appropriateness of our model by achieving state-of-the-art performance on a link prediction task and by achieving semantic coherence equivalent to that of latent Dirichlet allocation. We demonstrate the capability of our model for descriptive, explanatory, and exploratory analysis by investigating the inferred topic-specific communication patterns of a new government email dataset, the New Hanover County email corpus. This work was supported in part by the Center for Intelligent Information Retrieval and in part by the NSF GRFP under grant #1122374. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsors.

我们提出了一种面向网络内容与上下文的联合模型，旨在通过可视化特定主题的通信模式，实现对电子邮件网络的探索性分析。本模型属于面向文本与网络属性的混合模型：将词汇上的多项分布作为混合成分以解释文本特征，将参与者的潜在欧氏位置作为混合成分以解释网络属性。我们通过在链路预测任务上取得当前最优性能，且获得与潜在狄利克雷分配（Latent Dirichlet Allocation, LDA）相当的语义一致性，验证了本模型的适用性。我们通过对全新政府电子邮件数据集——新汉诺威县电子邮件语料库（New Hanover County email corpus）——中所推断出的特定主题通信模式展开分析，展示了本模型在描述性、解释性与探索性分析方面的能力。本研究部分得到智能信息检索中心（Center for Intelligent Information Retrieval）以及美国国家科学基金会研究生研究奖学金（NSF GRFP）（资助编号#1122374）的支持。本文所表达的任何观点、发现、结论或建议均仅代表作者本人，不一定代表资助方的立场。

创建时间：

2012-12-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集