A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/A_Tale_of_Two_Datasets_Representativeness_and_Generalisability_of_Inference_for_Samples_of_Networks/23925915
下载链接
链接失效反馈官方服务:
资源简介:
The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We wish to combine their strengths to learn the social forces that shape household contact formation and facilitate simulation for prediction of disease spread, while generalising to the population of households in the region. To accomplish this, we describe a flexible framework for specifying multi-network models in the exponential family class and identify the requirements for inference and prediction under this framework to be consistent, identifiable, and generalisable, even when data are incomplete; explore how these requirements may be violated in practice; and develop a suite of quantitative and graphical diagnostics for detecting violations and suggesting improvements to candidate models. We report on the effects of network size, geography, and household roles on household contact patterns (activity, heterogeneity in activity, and triadic closure). Supplementary materials for this article are available online.
过去二十年间,统计网络分析(statistical network analysis)的基础研究取得了显著进展,但从理论到应用的转化路径并非一帆风顺。本研究采用两种迥异却互补的抽样设计,收集了比利时境内两套大型异质性家庭内部小型接触网络样本:其一规模偏小,但完整记录了每户家庭内的所有接触关系;其二规模更大、代表性更强,但仅采集每户中一人的接触数据。我们期望融合二者优势,以探究塑造家庭接触关系形成的社会动因,并为疾病传播预测模拟提供支撑,同时将结论推广至该区域所有家庭构成的总体。为达成这一目标,本文提出了一种用于构建指数族类(exponential family class)多网络模型(multi-network models)的灵活框架,并明确了在此框架下实现统计推断、预测具备一致性、可识别性与可推广性的必要条件,即便在数据不完整的场景下亦能满足;探究了实际应用中这些条件可能被违背的情形;并开发了一套定量与图形化诊断工具,用于检测模型违背情况并为候选模型的优化提供指引。本文还分析了网络规模、地理区位与家庭角色对家庭接触模式(包括接触活跃度、活跃度异质性与三元闭合(triadic closure))的影响。本文配套补充材料可在线获取。
创建时间:
2023-08-10



