five

Processing of Published Data and Construction of the Core UVmap Reference

收藏
DataCite Commons2024-12-09 更新2025-01-06 收录
下载链接:
https://figshare.com/articles/dataset/Processing_of_Published_Data_and_Construction_of_the_Core_UVmap_Reference/27895560
下载链接
链接失效反馈
官方服务:
资源简介:
Processing of Published Data and Construction of the Core UVmap Reference In developing the core UVmap, we utilized the GBmap pipeline approach from Ruiz-Moreno et al.1 collecting data from 264,624 cells from 26 tumors and 29 normal tissues. We included only those samples confirmed as healthy eye or primary uveal melanoma, and healthy liver or metastatic liver, with each sample containing no fewer than 1,000 cells. The datasets for the core UVmap, which include Bakhoum et al.2, Lin et al.3, Ramachandran et al.4, and Gautam et al.5, were primarily in the form of raw count matrices. Where raw matrices were not available, BAM files were downloaded directly from the dbGaP cloud (Durante et al.6; phs001861.v1.p1; approved by dbGaP on May 24, 2022) or sourced directly from the authors (Pandiani et al.7), and were then transformed into FASTQ files and re-aligned using the STARsolo v2.7.10a pipeline (https://github.com/cellgeni/STARsolo). We updated all gene names to the most current HUGO nomenclature via HGNChelper and ensured all clinical and diagnostic metadata remained consistent. Prior to integrating the datasets, we applied stringent filtering parameters to select only high-quality cells, excluding those with fewer than 500 genes, fewer than 1000 UMI counts (where applicable), and over 30% mitochondrial reads. Doublets in each droplet-based dataset were identified and removed using DoubletFinder. To mitigate batch effects across the datasets, we employed a semi-supervised neural network model called single-cell ANnotation using Variational Inference (scANVI)8, within the transfer-learning framework of the single-cell architectural surgery algorithm (scArches)9. scArches-SCANVI necessitates prior knowledge of cell types/labels for reference map creation. To standardize cell type labels from different sources, we annotated each dataset employing both automated and manual methods. For the automated process, we initially collected lists of melanoma and GEP markers from Durante et al.6, 16 cancer cell states10, and a list of 174 adult eye and liver markers from a study published in Quan et al11. We then performed UCell signature scoring12 and applied a cutoff value of 0.2 to assign cells as state/marker-positive. Subsequently, manual cell identity was assigned based on results from the automated process, available original cell labels, and specific gene expression patterns analyzed via the Wilcoxon rank-sum test. CNV analysis was conducted using the CopyKAT package13, categorizing cells as either diploid or aneuploid. This preliminary coarse cell type labeling facilitated the training and integration of the model through scANVI-scArches. The analysis was executed on the raw counts from the 5000 most variable genes, considering studies as the batch variable and adhering to recommended tool parameters. The output from the pipeline was a latent representation of the integrated data, which then served as input for clustering and dimensional reduction visualizations. We applied Leiden clustering based on a k-nearest neighbor graph (k-NNG)14 to identify distinct cell populations, and Uniform Manifold Approximation and Projection (UMAP)15 for data embedding and two-dimensional reduction, using the plot1cell package16 for UMAP visualization. Post-co-embedding, cell identities were refined manually for each cluster, utilizing our unified preliminary annotations and evaluating specific marker gene expression to accurately define each broad cell type/state.1. Ruiz-Moreno, C. et al. doi: https://doi.org/10.1101/2022.08.27.5054392. Bakhoum, M.F. et al. Loss of polycomb repressive complex 1 activity and chromosomal instability drive uveal melanoma progression. Nat Commun 12, 5402 (2021).3. Lin, W. et al. Intra- and intertumoral heterogeneity of liver metastases in a patient with uveal melanoma revealed by single-cell RNA sequencing. Cold Spring Harb Mol Case Stud 7(2021).4. Ramachandran, P. et al. Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature 575, 512-518 (2019).5. Gautam, P. et al. Multi-species single-cell transcriptomic analysis of ocular compartment regulons. Nat Commun 12, 5675 (2021).6. Durante, M.A. et al. Single-cell analysis reveals new evolutionary complexity in uveal melanoma. Nat Commun 11, 496 (2020).7. Pandiani, C. et al. Single-cell RNA sequencing reveals intratumoral heterogeneity in primary uveal melanomas and identifies HES6 as a driver of the metastatic disease. Cell Death Differ 28, 1990-2000 (2021).8. Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol Syst Biol 17, e9620 (2021).9. Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat Biotechnol 40, 121-130 (2022).10. Barkley, D. et al. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat Genet 54, 1192-1201 (2022).11. Quan, F. et al. Annotation of cell types (ACT): a convenient web server for cell type annotation. Genome Med 15, 91 (2023). 12. Andreatta, M. & Carmona, S.J. UCell: Robust and scalable single-cell gene signature scoring. Comput Struct Biotechnol J 19, 3796-3798 (2021).13. Gao, R. et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol 39, 599-608 (2021).14. Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019).15. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol (2018).16. Wu, H. et al. Mapping the single-cell transcriptomic response of murine diabetic kidney disease to therapies. Cell Metab 34, 1064-1078.e6 (2022).
提供机构:
figshare
创建时间:
2024-12-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作