A Statistical Approach for Identifying the Best Combination of Normalization and Imputation Methods for Label-Free Proteomics Expression Data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/A_Statistical_Approach_for_Identifying_the_Best_Combination_of_Normalization_and_Imputation_Methods_for_Label-Free_Proteomics_Expression_Data/28006212
下载链接
链接失效反馈官方服务:
资源简介:
Label-free proteomics expression data sets often exhibit
data heterogeneity
and missing values, necessitating the development of effective normalization
and imputation methods. The selection of appropriate normalization
and imputation methods is inherently data-specific, and choosing the
optimal approach from the available options is critical for ensuring
robust downstream analysis. This study aimed to identify the most
suitable combination of these methods for quality control and accurate
identification of differentially expressed proteins. In this study,
we developed nine combinations by integrating three normalization
methods, locally weighted linear regression (LOESS), variance stabilization
normalization (VSN), and robust linear regression (RLR) with three
imputation methods: k-nearest neighbors (k-NN), local least-squares
(LLS), and singular value decomposition (SVD). We utilized statistical
measures, including the pooled coefficient of variation (PCV), pooled
estimate of variance (PEV), and pooled median absolute deviation (PMAD),
to assess intragroup and intergroup variation. The combinations yielding
the lowest values corresponding to each statistical measure were chosen
as the data set’s suitable normalization and imputation methods.
The performance of this approach was tested using two spiked-in standard
label-free proteomics benchmark data sets. The identified combinations
returned a low NRMSE and showed better performance in identifying
spiked-in proteins. The developed approach can be accessed through
the R package named ’lfproQC’ and a user-friendly Shiny
web application (https://dabiniasri.shinyapps.io/lfproQC and http://omics.icar.gov.in/lfproQC), making it a valuable resource for researchers looking to apply
this method to their data sets.
创建时间:
2024-12-11



