Fast Nonseparable Gaussian Stochastic Process With Application to Methylation Level Interpolation
收藏figshare.com2023-06-01 更新2025-03-25 收录
下载链接:
https://figshare.com/articles/dataset/Fast_Nonseparable_Gaussian_Stochastic_Process_With_Application_to_Methylation_Level_Interpolation/9805397/1
下载链接
链接失效反馈官方服务:
资源简介:
Gaussian stochastic process (GaSP) has been widely used as a prior over functions due to its flexibility and tractability in modeling. However, the computational cost in evaluating the likelihood is O(n3), where n is the number of observed points in the process, as it requires to invert the covariance matrix. This bottleneck prevents GaSP being widely used in large-scale data. We propose a general class of nonseparable GaSP models for multiple functional observations with a fast and exact algorithm, in which the computation is linear (O(n)) and exact, requiring no approximation to compute the likelihood. We show that the commonly used linear regression and separable models are special cases of the proposed nonseparable GaSP model. Through the study of an epigenetic application, the proposed nonseparable GaSP model can accurately predict the genome-wide DNA methylation levels and compares favorably to alternative methods, such as linear regression, random forest, and localized Kriging method. The supplementary materials of this article are online and the algorithm for fast computation is implemented in the FastGaSP R package on CRAN. Supplemental materials for this article are available online.
高斯随机过程(GaSP)因其建模中的灵活性与易处理性,已被广泛用作函数的先验分布。然而,在评估似然性时,其计算成本达到O(n^3),其中n为过程中的观测点数量,因为这需要求逆协方差矩阵。这一瓶颈阻碍了GaSP在大型数据中的应用。我们提出了一类通用的非可分GaSP模型,适用于多个功能观测,并采用了一种快速且精确的算法,该算法的计算复杂度为线性(O(n))且精确,无需近似即可计算似然性。我们证明,常用的线性回归和可分模型是所提非可分GaSP模型的特殊情况。通过一个表观遗传学应用的案例研究,我们提出的非可分GaSP模型能够准确预测全基因组DNA甲基化水平,并且与线性回归、随机森林和局部克里金方法等替代方法相比,表现更为优越。本文的补充材料可在网上获取,快速计算算法已实现于CRAN上的FastGaSP R包。
提供机构:
figshare.com



