Supplementary data: Comparison of two-stage methods for count data in Mendelian randomization: a simulation study

Figshare2026-03-19 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Supplementary_data_Comparison_of_two-stage_methods_for_count_data_in_Mendelian_randomization_a_simulation_study/31811824

下载链接

链接失效反馈

官方服务：

资源简介：

These are peer-reviewed supplementary materials for the article 'Comparison of two-stage methods for count data in Mendelian randomization: a simulation study' published in the Journal of Comparative Effectiveness ResearchSupplementary Table 1: Performance metrics (Bias, RMSE, 95% Confidence Interval Width, and Coverage) for the multiple-instrument scenario using an unweighted allele score from four weak SNPs.Supplementary Table 2: 95% Confidence Interval width, Coverage, and Type I Error for a null causal effect (0) across varying instrument strengths (IS = 0.001–1).Supplementary Table 3: 95% Confidence Interval width and Coverage for a positive causal effect (0.4) across varying instrument strengths (IS = 0.001–1).Supplementary Table 4: 95% Confidence Interval width, Coverage, and Type I Error for a null causal effect (0) across varying confounding effects (0.05–1.2).Supplementary Table 5: 95% Confidence Interval width and Coverage for a positive causal effect (0.4) across varying confounding effects (0.05–1.2).Aim: Mendelian randomization (MR) is an instrumental variable (IV) method that utilizes genetic variants to establish causality between risk factors and outcomes in observational studies. These methods were primarily developed under assumptions appropriate for continuous or approximately normally distributed variables. However, in many biomedical and clinical studies, exposures and outcomes are naturally recorded as counts, such as the number of disease episodes or clinical events. Despite this, two-stage MR methods are applied to count data without a clear understanding of their validity under such settings. While individual-level MR methods like two-stage predictor substitution (TSPS) and two-stage residual inclusion (TSRI) are common, their comparative performance for count exposures and outcomes remains unclear. Materials &methods: We conducted the first systematic evaluation of TSPS and TSRI for count data using Poisson and negative binomial models across realistic MR scenarios. Simulations varied instrument strength (IS), confounding, sample size and also focused on invalid instruments. Performance was assessed by bias, root mean square error (RMSE), 95% confidence interval (CI) coverage, CI width and Type I error rate. To demonstrate practical application, we applied these methods to investigate the causal relationship between alcohol consumption and gout attacks using empirical data. Results: Our results revealed that across all scenarios, TSRI with the Poisson model produced the most stable estimates with lower bias and RMSE. TSRI achieved near-nominal coverage and narrower CI across varying IS and confounding levels, maintaining Type I error close to 0.05. IS significantly impacted performance, with IS = 0.5 yielding estimates closer to the true values, while weaker instruments (IS = 0.1) led to higher RMSE and bias. Increasing sample size in the presence of invalid and weak genetic variants increased the bias. In additional simulations with multiple weak instruments, TSRI continued to outperform TSPS, yielding lower bias or RMSE, narrow CI width and near-nominal coverage across sample sizes. In our application, alcohol consumption was causally associated with an estimated 11.6–12.7% increase in the expected number of gout attacks per year per unit increase in alcohol intake, although the presence of an invalid single nucleotide polymorphism likely biased this estimate. Conclusion: This study advances MR methodology by clarifying how TSPS and TSRI behave with count exposures and outcomes, providing practical guidance for valid instrument selection and reliable causal inference in MR studies involving count data.

创建时间：

2026-03-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集