five

Inference on the Proportion of Variance Explained in Principal Component Analysis

收藏
Taylor & Francis Group2025-10-15 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Inference_on_the_proportion_of_variance_explained_in_principal_component_analysis/29828784/2
下载链接
链接失效反馈
官方服务:
资源简介:
Principal component analysis (PCA) is a longstanding approach for dimension reduction. It rests upon the assumption that the underlying signal has low rank, and thus can be well-summarized using a small number of dimensions. The output of PCA is typically represented using a scree plot, which displays the proportion of variance explained (PVE) by each principal component. While the PVE is extensively reported in routine analyses, to the best of our knowledge the notion of <i>inference</i> on the PVE remains unexplored. We consider inference on a new population quantity for the PVE with respect to an unknown matrix mean. Our interest lies in the PVE of the sample principal components (as opposed to unobserved population principal components); thus, the population PVE that we introduce is defined <i>conditional</i> on the sample singular vectors. We show that it is possible to conduct inference, in the sense of confidence intervals, <i>p</i>-values, and point estimates, on this population quantity. Furthermore, we can conduct valid inference on the PVE of a subset of the principal components, even when the subset is selected using a data-driven approach such as the elbow rule. We demonstrate our approach in simulation and in an application to gene expression data. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
提供机构:
Witten, Daniela; Panigrahi, Snigdha; Bien, Jacob; Perry, Ronan
创建时间:
2025-10-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作