Additional file 1 of Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
收藏DataCite Commons2021-09-15 更新2024-07-28 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Choice_of_pre-processing_pipeline_influences_clustering_quality_of_scRNA-seq_datasets/16620628
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 1: Fig. S1 Total gene detection of all datasets compared after processing with either kallisto or Cell Ranger. The Venn diagrams show commonly detected number of genes by both pipelines and uniquely detected genes. Fig. S2 Violin-plots showing distribution of gene and UMI detection per cell of all the analyzed datasets (Table 1) run with the Cell Ranger pipeline. Fig. S3 Violin-plots showing distribution of gene and UMI detection per cell of all the analyzed datasets (Table 1) run with the kallisto pipeline. Fig. S4 Cell counts of all datasets compared after processing with either kallisto forced or Cell Ranger. The Venn diagrams show commonly detected cell barcodes by both pipelines and uniquely detected cell barcodes. Fig. S5 Alignment results of all datasets (Table 1) run with either Cell Ranger or kallisto forced against Ensembl reference. a Percent alignment rates of all reads against the reference transcriptome. b Total gene detection. c Median gene counts over all cells per dataset. d Median UMI counts over all cells per dataset. e Total cell counts of each dataset. Fig. S6 Total gene detection of all datasets compared after processing with either kallisto forced or Cell Ranger. The Venn diagrams show commonly detected number of genes by both pipelines and uniquely detected genes. Fig. S7 Violin-plots showing distribution of gene and UMI detection per cell of all the analyzed datasets (Table 1) run with the kallisto forced pipeline. Fig. S8 Violin-plots showing distribution of gene and UMI detection per cell of the dr_pineal_s2 dataset after additional filtering for downstream analysis. Run with either Cell Ranger (a), kallisto (b) or kallisto forced (c). Fig. S9 Downstream analysis of dr_pineal_s2 before cluster merging. a 2D visualization using UMAP of Cell Ranger analyzed clusters before merging, with resolution equal to 0.9. Each point represents a single cell, colored according to cell type. The cells were clustered into 21 types. b Expression profile of marker genes according to cluster [7] of (a). Clusters 0, 1, 8 and 18 are all rod-like PhRs subclusters. They expressed rod-like PhR markers (exorh, gant1, gngt1), but the expression levels differed and resulted in their separation. For simplicity, they were merged and referred as a single rod-like PhRs cluster in the main text. Similarly, cluster 7 and 12 were merged into a single Müller-glia like cluster, clusters 2, 5, 16 were merged into a single RPE-like cluster, clusters 3 and 10 were merged into a single habenula kiss1 cluster and cluster 11 and 19 were merged into a single leukocytes cluster. c. 2D visualization using UMAP of Cell Ranger analyzed clusters, with resolution equal to 2. The cells were clustered into 31 types. However, the two different cone-like PhR cell types are still not distinguished from one another. d Expression profile of marker genes according to cluster of (c). e 2D visualization using UMAP of kallisto analyzed dr_pineal_s2 clusters before merging, with resolution equal to 0.9. The cells were clustered into 24 types. f Expression profile of marker genes according to cluster of (c). Similar to the descried above, clusters 1, 2, 3, 7 and 21 were merged into a single rod-like PhRs cluster, clusters 0, 9, 17 were merged into a single RPE-like cluster, clusters 11 and 12 were merged into a single Müller-glia like cluster, clusters 4, 5 and 20 were merged into a single habenula kiss1 cluster and clusters 13 and 22 were merged into a single leukocytes cluster. g 2D visualization using UMAP of kallisto forced analyzed dr_pineal_s2 clusters, with resolution equal to 1.2. The cells were clustered into 27 types. h Expression profile of marker genes according to cluster of (g). The col14a1b gene was only detected in the kallisto and kallisto forced datasets and is the strongest DE marker within the red cone-like cluster (f, h). Fig. S10 Heatmap of genes with higher counts in kallisto pre-processed pineal data. All the UMI counts for both kallisto and Cell Ranger were summed, and the diff_ratio value was calculated ( kallisto _ counts − CellRanger _ counts kallisto _ counts + CellRanger _ counts $$\frac{\left( kallisto\_ counts- CellRanger\_ counts\right)}{\left( kallisto\_ counts+ CellRanger\_ counts\right)}$$ ) for each gene (Additional file 1: Fig. 10). The top 80 diff_ratio genes, as well as the top 20 genes uniquely identified in kallisto were plotted according to the average scaled expression per cluster. Fig. S11 Heatmap of genes with higher counts in Cell Ranger pre-processed pineal data. All the UMI counts for both kallisto and Cell Ranger were summed, and the diff_ratio value was calculated ( kallisto _ counts − CellRanger _ counts kallisto _ counts + CellRanger _ counts $$\frac{\left( kallisto\_ counts- CellRanger\_ counts\right)}{\left( kallisto\_ counts+ CellRanger\_ counts\right)}$$ ) for each gene (Additional file 1: Fig. S11). The top 80 diff_ratio genes, as well as the top 20 genes uniquely identified in Cell Ranger were plotted according to the average scaled expression per cluster.
附加文件1:补充图S1:经kallisto或Cell Ranger处理后,所有数据集的总基因检出情况对比。韦恩图(Venn diagrams)展示了两种分析流程共同检出的基因数,以及各自独有的检出基因数。
补充图S2:小提琴图(violin-plots)展示了采用Cell Ranger流程分析的所有待评估数据集(表1)中,每个细胞的基因与UMI检出分布情况。
补充图S3:小提琴图(violin-plots)展示了采用kallisto流程分析的所有待评估数据集(表1)中,每个细胞的基因与UMI检出分布情况。
补充图S4:经强制模式kallisto或Cell Ranger处理后,所有数据集的细胞计数对比。韦恩图(Venn diagrams)展示了两种分析流程共同检出的细胞条形码(cell barcodes)数,以及各自独有的细胞条形码数。
补充图S5:采用Cell Ranger或强制模式kallisto,针对Ensembl参考基因组进行比对的所有数据集(表1)的比对结果。a:所有reads与参考转录组的比对率百分比;b:总基因检出数;c:每个数据集所有细胞的中位基因计数;d:每个数据集所有细胞的中位UMI计数;e:每个数据集的总细胞数。
补充图S6:经强制模式kallisto或Cell Ranger处理后,所有数据集的总基因检出情况对比。韦恩图(Venn diagrams)展示了两种分析流程共同检出的基因数,以及各自独有的检出基因数。
补充图S7:小提琴图(violin-plots)展示了采用强制模式kallisto流程分析的所有待评估数据集(表1)中,每个细胞的基因与UMI检出分布情况。
补充图S8:经额外下游分析过滤后,dr_pineal_s2数据集的每个细胞基因与UMI检出分布情况,分别采用Cell Ranger(a)、kallisto(b)或强制模式kallisto(c)流程分析。
补充图S9:dr_pineal_s2数据集在聚类合并前的下游分析。a:采用分辨率为0.9的UMAP(Uniform Manifold Approximation and Projection)对Cell Ranger分析的聚类结果进行二维可视化,每个点代表单个细胞,按细胞类型着色,共聚类为21种细胞类型。b:根据(a)中的聚类结果,展示标记基因的表达谱[7]。聚类0、1、8与18均为杆状光感受器(rod-like PhRs)亚簇,它们均表达杆状光感受器标记基因exorh、gant1、gngt1,但表达水平存在差异,因此被分开聚类。为简化描述,正文中将其合并为单个杆状光感受器簇。同理,聚类7与12合并为单个穆勒胶质细胞样簇,聚类2、5、16合并为单个视网膜色素上皮样(RPE-like)簇,聚类3与10合并为单个缰核kiss1簇,聚类11与19合并为单个白细胞簇。c:采用分辨率为2的UMAP对Cell Ranger分析的聚类结果进行二维可视化,共聚类为31种细胞类型,但仍无法区分两种不同的锥状光感受器细胞类型。d:根据(c)中的聚类结果,展示标记基因的表达谱。e:采用分辨率为0.9的UMAP对kallisto分析的dr_pineal_s2聚类结果(合并前)进行二维可视化,共聚类为24种细胞类型。f:根据(c)中的聚类结果,展示标记基因的表达谱。与前文描述一致,聚类1、2、3、7与21合并为单个杆状光感受器簇,聚类0、9、17合并为单个视网膜色素上皮样簇,聚类11与12合并为单个穆勒胶质细胞样簇,聚类4、5与20合并为单个缰核kiss1簇,聚类13与22合并为单个白细胞簇。g:采用分辨率为1.2的UMAP对强制模式kallisto分析的dr_pineal_s2聚类结果进行二维可视化,共聚类为27种细胞类型。h:根据(g)中的聚类结果,展示标记基因的表达谱。col14a1b基因仅在kallisto与强制模式kallisto数据集中共检出,且是红色锥状光感受器簇中表达量最高的差异表达标记基因(f、h)。
补充图S10:kallisto预处理的松果体数据中表达量更高的基因热图。将kallisto与Cell Ranger的所有UMI计数进行求和,为每个基因计算diff_ratio值:$frac{left( kallisto\_ counts- CellRanger\_ counts
ight)}{left( kallisto\_ counts+ CellRanger\_ counts
ight)}$(详见附加文件1:补充图10)。按每个聚类的平均标准化表达量,绘制排名前80的diff_ratio基因,以及在kallisto中独有的排名前20的基因。
补充图S11:Cell Ranger预处理的松果体数据中表达量更高的基因热图。将kallisto与Cell Ranger的所有UMI计数进行求和,为每个基因计算diff_ratio值:$frac{left( kallisto\_ counts- CellRanger\_ counts
ight)}{left( kallisto\_ counts+ CellRanger\_ counts
ight)}$(详见附加文件1:补充图11)。按每个聚类的平均标准化表达量,绘制排名前80的diff_ratio基因,以及在Cell Ranger中独有的排名前20的基因。
提供机构:
figshare
创建时间:
2021-09-15



