Data from: Multiple-trait genome-wide association study based on principal component analysis for residual covariance matrix
收藏DataONE2014-05-05 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Given the drawbacks of implementing multivariate analysis for mapping multiple traits in genome-wide association study (GWAS), principal component analysis (PCA) has been widely used to generate independent "super traits" from the original multivariate phenotypic traits for the univariate analysis. However, parameter estimates in this framework may not be the same as those from the joint analysis of all traits, leading to spurious linkage results. In this paper we propose to perform the PCA for residual covariance matrix instead of the phenotypical covariance matrix, based on which multiple traits are transformed to a group of pseudo principal components. PCA for residual covariance matrix allows analyzing each pseudo principal components separately, and all parameter estimates are equivalent to those obtained by the joint multivariate analysis under a linear transformation. However, a fast least absolute shrinkage and selection operator (LASSO) for estimating the sparse oversaturated genetic model greatly reduces the computational costs of this procedure. Extensive simulations show statistical and computational efficiencies of the proposed method. We illustrate this method in a GWAS for 20 slaughtering traits and meat quality traits in beef cattle.
鉴于全基因组关联分析(genome-wide association study, GWAS)中用于多性状作图的多变量分析存在诸多缺陷,主成分分析(principal component analysis, PCA)已被广泛用于从原始多变量表型性状中提取独立的“超性状”,以开展单变量分析。然而,该框架下的参数估计结果与所有性状联合分析得到的参数估计并不一致,可能会产生假阳性连锁结果。
本文提出针对残差协方差矩阵而非表型协方差矩阵开展主成分分析,据此将多性状转换为一组伪主成分。基于残差协方差矩阵的主成分分析可实现各伪主成分的单独分析,且所有参数估计结果经线性变换后与联合多变量分析得到的结果完全等价。此外,我们采用一种快速最小绝对收缩和选择算子(least absolute shrinkage and selection operator, LASSO)估计稀疏超饱和遗传模型,大幅降低了该流程的计算成本。
大量模拟实验验证了所提方法的统计有效性与计算效率。最后,我们通过肉牛20项屠宰性状与肉品质性状的全基因组关联分析实例,展示了该方法的实际应用效果。
创建时间:
2014-05-05



