The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE)
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/The_Often-Overlooked_Power_of_Summary_Statistics_in_Exploratory_Data_Analysis_Comparison_of_Pattern_Recognition_Entropy_PRE_to_Other_Summary_Statistics_and_Introduction_of_Divided_Spectrum-PRE_DS-PRE_/16598726
下载链接
链接失效反馈官方服务:
资源简介:
Unsupervised
exploratory data analysis (EDA) is often the first
step in understanding complex data sets. While summary statistics
are among the most efficient and convenient tools for exploring and
describing sets of data, they are often overlooked in EDA. In this
paper, we show multiple case studies that compare the performance,
including clustering, of a series of summary statistics in EDA. The
summary statistics considered here are pattern recognition entropy
(PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares
(SSQ), and X4, which are compared with
principal component analysis (PCA), multivariate curve resolution
(MCR), and/or cluster analysis. PRE and the other summary statistics
are direct methods for analyzing datathey are not factor-based
approaches. To quantify the performance of summary statistics, we
use the concept of the “critical pair,” which is employed
in chromatography. The data analyzed here come from different analytical
methods. Hyperspectral images, including one of a biological material,
are also analyzed. In general, PRE outperforms the other summary statistics,
especially in image analysis, although a suite of summary statistics
is useful in exploring complex data sets. While PRE results were generally
comparable to those from PCA and MCR, PRE is easier to apply. For
example, there is no need to determine the number of factors that
describe a data set. Finally, we introduce the concept of divided
spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination
power of PRE. We also show that DS-PRE can be used to provide the
inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for
unsupervised EDA.
创建时间:
2021-09-09



