Data fusion using weakly aligned sources
收藏Taylor & Francis Group2025-03-13 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Data_fusion_using_weakly_aligned_sources/28590287/1
下载链接
链接失效反馈官方服务:
资源简介:
We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment is known up to finite-dimensional parameters. We quantify the additional efficiency gains achieved through the integration of these weakly aligned sources. We characterize the semiparametric efficiency bound and provide a general means to construct estimators achieving these efficiency gains. We illustrate our results by fusing data from two harmonized HIV monoclonal antibody prevention efficacy trials to study how a neutralizing antibody biomarker associates with HIV genotype.
本研究提出一种全新的数据融合(data fusion)方法,该方法可依托多源数据对平滑的有限维参数(finite-dimensional parameter)进行估计。现有多数方法仅能利用完全对齐数据源(fully aligned data sources)——这类数据源共享一个或多个目标变量(variables of interest)的共同条件分布。然而在诸多应用场景中,完全对齐数据源的稀缺性会导致现有方法需要过大的样本量才能具备实用价值。本方法则支持引入未完全对齐的弱对齐数据源(weakly aligned data sources),前提是其错位程度(misalignment)可通过有限维参数进行表征。本研究量化了通过整合这类弱对齐数据源所获得的额外效率增益(efficiency gains)。我们刻画了半参数效率界(semiparametric efficiency bound),并给出了一套通用的构造方案以构建能够实现上述效率增益的估计量(estimators)。为阐释本研究的相关结论,我们通过融合两项经过统一协调的HIV单克隆抗体预防效力试验的数据,探究了中和抗体生物标志物(neutralizing antibody biomarker)与HIV基因型(HIV genotype)之间的关联关系。
提供机构:
Duan, Rui; Luedtke, Alex; Li, Sijia; Gilbert, Peter B.
创建时间:
2025-03-13



