A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
收藏DataCite Commons2025-06-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.cnp5hqc7v
下载链接
链接失效反馈官方服务:
资源简介:
Background The Infinium EPIC array measures the methylation status
of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design:
Infinium Type I and Type II probes. These probe types exhibit different
technical characteristics which may confound analyses. Numerous
normalization and pre-processing methods have been developed to reduce
probe type bias as well as other issues such as background and dye bias.
Methods This study evaluates the performance of various normalization
methods using 16 replicated samples and three metrics: absolute beta-value
difference, overlap of non-replicated CpGs between replicate pairs, and
effect on beta-value distributions. Additionally, we carried out Pearson’s
correlation and intraclass correlation coefficient (ICC) analyses using
both raw and SeSAMe 2 normalized data. Results The method we
define as SeSAMe 2, which consists of the application of the regular
SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found
to be the best-performing normalization method, while quantile-based
methods were found to be the worst performing methods. Whole-array
Pearson’s correlations were found to be high. However, in agreement with
previous studies, a substantial proportion of the probes on the EPIC array
showed poor reproducibility (ICC < 0.50). The majority of
poor-performing probes have beta values close to either 0 or 1, and
relatively low standard deviations. These results suggest that probe
reliability is largely the result of limited biological variation rather
than technical measurement variation. Importantly, normalizing the data
with SeSAMe 2 dramatically improved ICC estimates, with the proportion of
probes with ICC values > 0.50 increasing from 45.18% (raw data) to
61.35% (SeSAMe 2).
提供机构:
Dryad
创建时间:
2023-05-30



