five

Data Matrix Normalization and Merging Strategies Minimize Batch-specific Systemic Variation in scRNA-Seq Data

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE185590
下载链接
链接失效反馈
官方服务:
资源简介:
Single-cell RNA sequencing (scRNA-seq) can reveal accurate and sensitive RNA abundance in a single sample, but robust integration of multiple samples remains challenging. Large-scale scRNA-seq data generated by different workflows or laboratories can contain batch-specific systemic variation. Such variation challenges data integration by confounding sample-specific biology with undesirable batch-specific systemic effects. Therefore, there is a need for guidance in selecting computational and experimental approaches to minimize batch-specific impacts on data interpretation and a need to empirically evaluate the sources of systemic variation in a given dataset. To uncover the contributions of experimental variables to systemic variation, we intentionally perturb four potential sources of batch-effect in five human peripheral blood samples. We investigate sequencing replicate, sequencing depth, sample replicate, and the effects of pooling libraries for concurrent sequencing. To quantify the downstream effects of these variables on data interpretation, we introduced a new scoring metric, the Cell Misclassification Statistic (CMS), which identifies losses to cell type fidelity that occur when merging datasets of different batches. CMS reveals an undesirable overcorrection by popular batch-effect correction and data integration methods. We show that optimizing gene expression matrix normalization and merging can reduce the need for batch-effect correction and minimize the risk of overcorrecting true biological differences between samples. scRNA-seq of the gene expression profile, surface-protein expression profile of 5 Peripheral blood mononuclear cell (PBMC) from healthy donors, in which the GEX libraries of 2 of the samples were sequenced twice.
创建时间:
2025-10-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作