Processed AnnData objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics)

Name: Processed AnnData objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics)
Creator: figshare
Published: 2025-05-01 06:29:45
License: 暂无描述

DataCite Commons2025-05-01 更新2024-08-26 收录

下载链接：

https://figshare.com/articles/dataset/Processed_AnnData_objects_for_GeneTrajectory_inference_Gene_Trajectory_Inference_for_Single-cell_Data_by_Optimal_Transport_Metrics_/25539547/1

下载链接

链接失效反馈

官方服务：

资源简介：

These are processed AnnData objects (converted from Seurat objects) for GeneTrajectory tutorials (https://github.com/KlugerLab/GeneTrajectory-python/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories.<br>Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.

本数据集为适配GeneTrajectory教程（https://github.com/KlugerLab/GeneTrajectory-python/）的经处理AnnData对象（AnnData，由Seurat对象（Seurat）转换而来），包含两部分分析内容： ### 人类髓系数据集分析髓系细胞取自公开的10x单细胞RNA测序（scRNA-seq）数据集（https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3）。质控（QC）流程采用与文献（https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R）一致的工作流。经Seurat R包完成标准化、高可变基因筛选与数据缩放后，我们执行主成分分析（PCA）并保留前30个主成分。基于分辨率为0.3的Louvain聚类（Louvain），我们鉴定得到4个髓系细胞亚群；采用Wilcoxon秩和检验筛选簇特异性基因标记物，以完成细胞类型注释。对于基因轨迹推断，我们首先对细胞主成分嵌入结果应用扩散映射（Diffusion Map，采用局部自适应核，带宽由每个细胞与其10个最近邻的距离确定）以生成细胞的光谱嵌入。基于前5个非平凡扩散映射特征向量的坐标，我们构建细胞-细胞k近邻（kNN，k=10）图。在前2000个可变基因中，保留在0.5%至75%细胞中表达的基因用于计算基因对间的Wasserstein距离。将原始细胞图粗粒度化为规模为1000的图。随后我们构建基因-基因图，其中基因间的亲和度由Wasserstein距离通过高斯核（Gaussian kernel，局部自适应，k=5）转换得到。采用扩散映射可视化基因图的嵌入结果。对于轨迹识别，我们使用三组时间步长（11、21、8）提取三条基因轨迹。 ### 小鼠胚胎皮肤数据分析我们从新采集的小鼠胚胎皮肤样本中分离出真皮细胞群，将野生型与Wls突变体的细胞合并用于分析。经Seurat完成标准化、高可变基因筛选与数据缩放后，我们执行主成分分析并保留前30个主成分。基于经典真皮标记基因（包括Sox2、Dkk1与Dkk2）的表达谱，我们将细胞分为3种真皮细胞类型。对于基因轨迹推断，我们首先对细胞主成分嵌入结果应用扩散映射（采用局部自适应核带宽，k=10）以生成细胞的光谱嵌入。基于前10个非平凡扩散映射特征向量的坐标，我们构建细胞-细胞k近邻（k=10）图。在前2000个可变基因中，保留在1%至50%细胞中表达的基因用于计算基因对间的Wasserstein距离。将原始细胞图粗粒度化为规模为1000的图。随后我们构建基因-基因图，其中基因间的亲和度由Wasserstein距离通过高斯核（局部自适应，k=5）转换得到。采用扩散映射可视化基因图的嵌入结果。对于轨迹识别，我们使用三组时间步长（9、16、5）依次提取三条基因轨迹。为比较野生型与Wls突变体之间的差异，我们根据沿DC基因轨迹分箱的基因表达谱，将Wnt活性UD细胞划分为7个阶段。

提供机构：

figshare

创建时间：

2024-04-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集