five

Memory Efficient Principal Component Analysis for the Dimensionality Reduction of Large Mass Spectrometry Imaging Data Sets

收藏
Figshare2016-02-19 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Memory_Efficient_Principal_Component_Analysis_for_the_Dimensionality_Reduction_of_Large_Mass_Spectrometry_Imaging_Data_Sets/2432593
下载链接
链接失效反馈
官方服务:
资源简介:
A memory efficient algorithm for the computation of principal component analysis (PCA) of large mass spectrometry imaging data sets is presented. Mass spectrometry imaging (MSI) enables two- and three-dimensional overviews of hundreds of unlabeled molecular species in complex samples such as intact tissue. PCA, in combination with data binning or other reduction algorithms, has been widely used in the unsupervised processing of MSI data and as a dimentionality reduction method prior to clustering and spatial segmentation. Standard implementations of PCA require the data to be stored in random access memory. This imposes an upper limit on the amount of data that can be processed, necessitating a compromise between the number of pixels and the number of peaks to include. With increasing interest in multivariate analysis of large 3D multislice data sets and ongoing improvements in instrumentation, the ability to retain all pixels and many more peaks is increasingly important. We present a new method which has no limitation on the number of pixels and allows an increased number of peaks to be retained. The new technique was validated against the MATLAB (The MathWorks Inc., Natick, Massachusetts) implementation of PCA (princomp) and then used to reduce, without discarding peaks or pixels, multiple serial sections acquired from a single mouse brain which was too large to be analyzed with princomp. Then, k-means clustering was performed on the reduced data set. We further demonstrate with simulated data of 83 slices, comprising 20 535 pixels per slice and equaling 44 GB of data, that the new method can be used in combination with existing tools to process an entire organ. MATLAB code implementing the memory efficient PCA algorithm is provided.
创建时间:
2016-02-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作