Processed (filtered and annotated) scRNA-seq data

Name: Processed (filtered and annotated) scRNA-seq data
Creator: figshare
Published: 2024-09-03 09:47:31
License: 暂无描述

DataCite Commons2024-09-03 更新2024-09-03 收录

下载链接：

https://figshare.com/articles/dataset/Processed_filtered_and_annotated_scRNA-seq_data/26894194

下载链接

链接失效反馈

官方服务：

资源简介：

Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dose escalation study of the HLA-A2-WT1 CD3 bispecific antibody RO7283420 in relapsed/refractory acute myeloid leukemia" by Hutchings and Korfi et al. Processed (filtered and annotated) data is provided, which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse. processed.h5ad provides raw counts for those cells that passed QC, along with cell type annotation and relevant metadata in the standard H5AD format. For instance, to load data in R, try: library(zellkonverter)processed <- readH5AD(file = "./processed.h5ad", X_name = "counts") ############################## Single-cell RNA sequencing was performed on cryopreserved bone marrow mononuclear cells (BMMCs) and peripheral blood mononuclear cells (PBMCs). After thawing and sorting for viability by flow cytometry, samples were processed and sequenced using the 10x Genomics Single Cell 5’ v2 Gene Expression protocol on the 10x Chromium platform. Raw sequencing data were processed with the Cell Ranger pipeline (v7.1.0) and a custom-built transcriptome reference using the GRCh38 genome and GENCODE v43 annotation, following the exact build steps provided by 10x Genomics. Filtered feature barcode matrices obtained from Cell Ranger were imported and concatenated as a SingleCellExperiment object for downstream analysis in R. Per-cell quality control metrics were computed using the scuttle Bioconductor package. We applied a manual filtering scheme, using a hard cutoff of 10 for mitochondrial gene percentage, and the 5th percentile of total UMI counts and total number of detected genes. Next, we used scDblFinder (v1.12.0) Bioconductor package to exclude potential doublets in the samples. To identify and annotate distinct cell populations, we integrated BMMC and PBMC samples using scanorama (v1.7.4). In brief, highly variable genes were selected using modelGeneVar functionality from the scran (v1.26.2) Bioconductor package, by blocking on individual samples and retaining the top 10% of highly variable genes. We excluded mitochondrial, ribosomal, as well as T- and B-cell receptor genes from the list of highly variable genes to minimize technical and/or donor effects. Feature counts for each cell were divided by the total counts for that cell and multiplied by 10^4, followed by natural-log transformation to yield logCP10k normalized values. Normalized data was z-standardized for the highly variable genes and used as input to scanorama with default parameters. We assembled a Seurat object using the integrated PC space obtained from scanorama to perform shared nearest-neighbor graph construction, clustering, and 2D visualization using Seurat (v5.0.2) R package. We used a set of known marker genes previously reported by Wang B, et al. Nat Commun. 2024 to annotate main cell types. Malignant cells were identified as the three highest WT1-expressing populations with AML and hematopoietic progenitor gene expression profiles and referred collectively to as “AML” for downstream analyses. In the integrated space, T and NK cells clustered together and separated from the rest of cell types. To further split T and NK cells, and their respective subsets, we used an atlas of Bone Marrow hematopoiesis (https://github.com/andygxzeng/BoneMarrowMap) and stratified T/NK cells into 4 categories (CD4T, CD8T, NK, and other), with the “other” category representing proliferating cells and negligible non-T/NK contaminating cells. We further cross-checked the annotations obtained using selected markers as well as those obtained from BoneMarrowMap, with Azimuth BMMC and PBMC reference atlases provided by the HuBMAP consortium (https://azimuth.hubmapconsortium.org) and observed acceptable agreement between them.For downstream analysis of CD8 T cells, we calculated the per-cell enrichment score of naive-like (CCR7, SELL, LEF1, TCF7, IL7R, LTB), cytotoxic (CX3CR1, PRF1, FGFBP2, GZMB, KLRG1, FCGR3A, GZMA, GZMH, GNLY, NKG7, KLRD1), predysfunctional / effector memory (GZMK, CXCR3, ZNF683, CD28, FYN, EOMES, CXCR4, CD44), and exhausted (PDCD1, HAVCR2, LAG3, CTLA4, TIGIT, CXCL13, LAYN, ENTPD1) signatures previously reported by van der Leun et. al. Nature Rev Cancer 2020, using the Adjusted Neighborhood Scoring (ANS) (https://github.com/lciernik/ANS_signature_scoring). Mean enrichment score per sample (median of 700 cells per sample) was consequently used to compare baseline CD8 T-cell states in BMMCs from patients with or without a BM blast reduction.For downstream analysis of AML cells, WT1+ / HLA-A+ / PSMB9+ cells were defined based on a non-zero UMI threshold (UMI>0). We further adopted a recent classification system of AML cells into 7 subsets (leukemia stem and progenitor cell (LSPC)-Quiescent, LSPC-Primed, LSPC-Cycle, granulocyte-monocyte progenitor (GMP)-like, Promonocyte-like, Monocyte-like, and conventional dendritic cell (cDC)-like) as reported by Zeng et. al. Nat Med 2022, to identify different AML subpopulations. In brief, signature genes associated with each of these subsets were obtained from the original publication, and ANS scores were calculated per cell as described above. Finally, each cell was assigned to one of the seven subsets based on its maximum enrichment score. For more details, please refer to the publication.

提供机构：

figshare

创建时间：

2024-09-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集