An interpretable and adaptive autoencoder for efficient tissue deconvolution

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP586781

下载链接

链接失效反馈

官方服务：

资源简介：

Deconvolution models are a powerful tool for extracting cell type-specific information from bulk gene expression profiles. Current methods leverage advanced machine learning models and high-resolution sequencing, like single-cell RNA-sequencing (scRNA-seq), showing promising results across diverese tissues and conditions. However, they still present important limitations: Many depend on selecting a robust reference, which can strongly affect the deconvolution. Secondly, pseudobulk data used for training and real bulk RNA-seq samples often exhibit strong distribution shifts, which are currently unaccounted for. Finally, most deconvolution approaches behave as black boxes, which can compromise the reliability of the results. Here, we present Sweetwater, an adaptive and interpretable autoencoder that efficiently deconvolves bulk samples leveraging multiple classes of reference data. Moreover, we propose an improved way of generating training data from a mixture of FACS-sorted FASTQ files, reducing platform-specific biases and outperforming current single-cell-based references. Furthermore, we introduce a gold standard dataset to facilitate fair and accurate evaluation of deconvolution approaches. Finally, we demonstrate that Sweetwater adapts effectively to deconvolved samples during training, uncovering biologically meaningful patterns and enhancing result's reliability. Sweetwater is available at https://github.com/ML4BM-Lab/Sweetwater, and we anticipate it will expedite the accurate examination of high-throughput clinical data across diverse applications. Overall design: This dataset was developed to benchmark cell type deconvolution methods using both single-cell and bulk RNA sequencing data from peripheral blood immune cells. Four immune cell typesâT cells, B cells, Neutrophils, and Monocytesâwere FACS-isolated from human PBMCs, and bulk RNA-seq was performed on each purified population. To generate controlled mixtures with known composition, seven artificial mixes were created by manually combining RNA from the four purified cell types in specific proportions, and each mix was prepared in duplicate (e.g., Mix 1.1 and Mix 1.2). The mixtures were composed as follows: Mix 1.1 and 1.2 consisted of 25% Neutrophils, 25% T cells, 25% B cells, and 25% Monocytes; Mix 2.1 and 2.2 contained 30% Neutrophils, 30% T cells, 10% B cells, and 30% Monocytes; Mix 3.1 and 3.2 included 50% Neutrophils, 20% T cells, 10% B cells, and 20% Monocytes; Mix 4.1 and 4.2 were composed of 47% Neutrophils, 47% T cells, 6% B cells, and 0% Monocytes; Mix 5.1 contained 50% Neutrophils, 49% T cells, 1% B cells, and 0% Monocytes, while Mix 5.2 had 50% Neutrophils, 35% T cells, 1% B cells, and 14% Monocytes; Mix 6.1 and 6.2 were made up of 25% Neutrophils, 25% T cells, 0% B cells, and 50% Monocytes; and Mix 7.1 and 7.2 included 50% Neutrophils, 50% T cells, and 0% of both B cells and Monocytes. This experimental design allows for the controlled testing of deconvolution methods under varying levels of cell type abundance, including rare cell type scenarios and highly imbalanced compositions, providing a robust benchmark framework.

创建时间：

2025-09-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集