five

Regularized Cross-Sectional Network Modeling with Missing Data: A Comparison of Methods

收藏
DataCite Commons2025-09-17 更新2026-02-09 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Regularized_Cross-Sectional_Network_Modeling_with_Missing_Data_A_Comparison_of_Methods/30145926/1
下载链接
链接失效反馈
官方服务:
资源简介:
Many applications of network modeling involve cross-sectional data of psychological variables (e.g., symptoms for psychological disorders), and analyses are often conducted using a regularized Gaussian graphical model (GGM) employing a lasso, also known as the graphical lasso or <i>glasso</i>. Appropriate methodology for handling missing data is underdeveloped while using glasso, precluding the use of planned missing data designs to reduce participant fatigue. In this research, we compare three approaches to handling missing data with glasso. The first resembles a two-stage estimation approach—borrowed from the covariance structure modeling literature—whereby a saturated covariance matrix among the items is estimated prior to using glasso. The second and third approaches use glasso and the expectation-maximization (EM) algorithm in a single stage and either use EBIC or cross-validation for tuning parameter selection. We compared these approaches in a simulation study with a variety of sample sizes, proportions of missing data, and network saturation. An example with data from the Patient Reported Outcomes Measurement Information System is also provided. The EM algorithm with cross-validation performed best, but all methods appeared to be viable strategies under larger samples and with less missing data.

网络建模的诸多应用场景涉及心理变量的横断面数据(例如心理障碍的症状维度),此类分析常采用引入套索(lasso)正则化的高斯图模型(regularized Gaussian graphical model, GGM),该方法也被称为图套索(graphical lasso)或glasso。但当前针对glasso应用场景下的缺失数据处理方法尚不完善,这使得研究者无法使用计划性缺失数据设计来降低被试疲劳。本研究针对glasso的缺失数据处理问题,对比了三类方法:第一类方法借鉴自协方差结构模型领域的研究思路,先估计各条目间的饱和协方差矩阵,再将其用于glasso分析;第二类与第三类方法均采用单阶段流程,将glasso与期望最大化(expectation-maximization, EM)算法结合,二者的区别在于调参选择分别使用扩展贝叶斯信息准则(EBIC)与交叉验证。本研究通过模拟实验对比了这三类方法的性能,实验中设置了不同的样本量、缺失数据比例与网络饱和度;此外还提供了一则基于患者报告结局测量信息系统(Patient Reported Outcomes Measurement Information System)数据的应用实例。结果显示,结合交叉验证的EM算法表现最优,但在样本量较大、缺失数据比例较低的场景下,三类方法均具备可行性。
提供机构:
Taylor & Francis
创建时间:
2025-09-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作