Score Matching for Compositional Distributions
收藏DataCite Commons2022-03-03 更新2024-08-18 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Score_matching_for_compositional_distributions/17161180
下载链接
链接失效反馈官方服务:
资源简介:
Compositional data are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. With real data, it is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. Major limitations of currently available models for compositional data include one or more of the following: insufficient flexibility in terms of distributional shape; difficulty in accommodating zeros in the data in estimation; and lack of computational viability in moderate to high dimensions. In this article, we propose a new model, the polynomially tilted pairwise interaction (PPI) model, for analysing compositional data. Maximum likelihood estimation is difficult for the PPI model. Instead, we propose novel score matching estimators, which entails extending the score matching approach to Riemannian manifolds with boundary. These new estimators are available in closed form and simulation studies show that they perform well in practice. As our main application, we analyse real microbiome count data with fixed totals using a multinomial latent variable model with a PPI model for the latent variable distribution. We prove that, under certain conditions, the new score matching estimators are consistent for the parameters in the new multinomial latent variable model.
组合数据(compositional data)因其样本空间需满足非负性与总和为1的约束,分析难度较大。在实际数据场景中,多数组合成分往往呈现高度右偏分布,且伴随大量零值。当前现有组合数据模型通常存在以下一项或多项局限:分布形状灵活性不足;估计过程中难以处理数据中的零值;在中高维场景下缺乏计算可行性。本文提出一种用于组合数据分析的新型模型——多项式倾斜成对交互(polynomially tilted pairwise interaction, PPI)模型。由于PPI模型难以通过最大似然估计进行求解,我们提出了新颖的得分匹配估计量,该方法需将得分匹配框架拓展至带边界的黎曼流形。此类新型估计量可通过闭式解求得,模拟研究表明其在实际应用中表现优异。作为核心应用案例,我们采用结合PPI模型作为潜变量分布的多项潜变量模型,对带有固定总和的实际微生物组计数数据开展分析。我们证明了在特定条件下,新型得分匹配估计量对于该多项潜变量模型的参数具有相合性。
提供机构:
Taylor & Francis
创建时间:
2021-12-10



