Raw and processed (filtered and annotated) scRNAseq data

Name: Raw and processed (filtered and annotated) scRNAseq data
Creator: figshare
Published: 2025-06-01 06:48:38
License: 暂无描述

DataCite Commons2025-06-01 更新2024-08-19 收录

下载链接：

https://figshare.com/articles/dataset/Raw_and_processed_filtered_and_annotated_scRNAseq_data/25561950/1

下载链接

链接失效反馈

官方服务：

资源简介：

Single cell RNA-seq data generated and reported as part of the manuscript entitled "Human CD34+-derived plasmacytoid dendritic cells as surrogates for primary pDCs and potential cancer immunotherapy" by Fiore et al.Raw and processed (filtered and annotated) data are provided, which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse:1- raw.h5ad provides concatenated raw/unfiltered table of counts as obtained from Cell Ranger, along with relevant metadata in the standard H5AD format.2- processed.h5ad provides raw and normalized counts for those cells that passed QC and were annotated as pDC, along with relevant metadata in the standard H5AD format.For instance, to load data in R, try:library(zellkonverter)raw <- readH5AD(file = "./raw.h5ad", X_name = "counts")processed <- readH5AD(file = "./processed.h5ad", X_name = "logcounts")##############################scRNAseq data generation:Differentiated CB-DCs (3 independent donors) either left unprimed or primed with IFN were used to enable characterization of the heterogeneity of the in vitro differentiation protocol. For comparison, primary pan-DCs (3 independent donors) were isolated from PBMCs as described above. CB-DCs and primary pan-DCs were normalized to 10,000 pDCs per well and stimulated with TLR9 or TLR7 agonists for 4 hrs or left untreated. A total of 27 samples were included for scRNAseq. Single-cell RNA-seq was performed using Chromium Connect (10x Genomics). Next GEM Automated Single Cell 5' Reagent Kits v2 (PN-1000290, 10 x Genomics, Pleasanton, CA, USA) were used following the manufacturer’s protocol. Roughly 8000–10,000 cells per sample were diluted at a density of 100–800 cells/μL in PBS plus 1% BSA determined by Cellometer Auto 2000 Cell Viability Counter (Nexelom Bioscience, Lawrence, MA), and were loaded onto the chip. The quality and concentration of both cDNA and libraries were assessed using an Agilent BioAnalyzer with High Sensitivity kit (#5067–4626, Agilent, Santa Clara, CA USA) and Qubit Fluorometer with dsDNA HS assay kit (#Q33230, Thermo Fischer Scientific, Waltham, MA) according to the manufacturer’s recommendation. For sequencing, samples were mixed in equimolar fashion and sequenced on an Illumina Nova Seq 6000 with a targeted read depth of 20,000 reads/cell and sequencing parameters were set for Read 1 (26 cycles), i7 Index (10 cycles), i5 Index (10 cycles) and Read 2 (90 cycles). The Cell Ranger mkfastq function was used to convert the output files into FASTQ files.scRNAseq data analysis:For data processing and quality control, raw sequencing reads were mapped to the GRCh38 genome using the Cell Ranger Single Cell software (10x Genomics). Raw gene expression matrices generated per sample were merged and analyzed with the besca package. First, low quality cells and potential multiplets were excluded (minimum 600 genes, 1,000 counts, maximum 6,500 genes and 60,000 counts), resulting in 4,000 to 8,000 cells per sample and a total of 183,398 cells passing quality control for downstream analysis. Filtered cells were normalized by log-transformed UMI counts per 10,000 reads [log(CP10K+1)]. After scaling the gene expression, the most variable genes per sample were calculated (minimum mean expression of 0.0125, maximum mean expression of 3 and minimum dispersion of 0.5) and those shared by at least 50% of the samples, in total 2,208 genes, were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbors and the neighborhood graph was then embedded into the two-dimensional space using the uniform manifold approximation and projection (UMAP) algorithm. Cell clustering was performed using the Leiden algorithm. Cell type annotation was performed using the Sig-annot semi-automated besca module. The gene sets used for different cell types can be found under:https://github.com/bedapub/besca/blob/main/besca/datasets/genesets/CellNames_scseqCMs6_sigs.gmtGitHub/besca/besca/datasets/genesets/CellNames_scseqCMs6_sigs.gmt.First, each cluster was assigned to a cell type at different levels of granularity. Subsequently, annotations were manually inspected to resolve cluster mixtures, especially for different DC types. Cell type annotations were further curated by selecting a cluster and applying heuristic cutoffs on a combination of signature scores to reannotate individual cells. The per-cell signature scores were calculated with the scanpy function scanpy.tl.score_genes, using default parameters and besca signatures. Cells annotated as doublets were excluded from downstream analyses. In order to generate visualizations, such as the expression level of selected genes across conditions, custom scripts with mainly besca and scanpy functions were used.For more details, please refer to the publication.

提供机构：

figshare

创建时间：

2024-04-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集