MCA DGE Data
收藏DataCite Commons2025-06-01 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/MCA_DGE_Data/5435866/7
下载链接
链接失效反馈官方服务:
资源简介:
MCA single cell DGE data (Cells with >500UMI ) for the following manuscript:Mapping the Mouse Cell Atlas by Microwell-seq<br>MCA_500more_dge.rar: The raw digital expression matrix (dge) of more than 400,000 single cells sorted by tissues. All cells have more than 500 transcripts. The batch genes were not removed.<br>MCA_BatchRemove_dge.zip: The batch gene removed dge of more than 200,000 primary single cells sorted by tissues. Some tissues are not included due to relatively strong batch effects. This dataset can be used to make global tissue tSNE plot and do cross-tissue analysis.<br>MCA_CellAssignments.csv: The annotation of cells, which includes the cell names, cluster ID, belonged tissues, experimental batches and cell barcodes.<br>MCA_Figure2-batch-removed.txt.tar.gz: The batch removed dge of approximately 60,000 cells of high quality. 1500 cells were sampled from 43 tissues respectively. This sampled data is used for Figure 2.<br>MCA_Figure2_Cell.info.xlsx: The annotations of cells used in Figure2. <br>Sheet1: The annotations of each cell used in Figure2, including cell names, cluster ID, belonged tissues. <br>Sheet2: The annotations of 98 clusters in Figure2. <br>Sheet3: The composition of cell numbers in 98 clusters and 43 tissues. <br>MCA_Batch Information.xlsx: The batch information, which includes the age and gender of the mouse, and experiment batches for MCA data.<br><br>Batch effect removalFor cross tissue comparison, we removed the batch gene background to improve presentation. We assume that for each batch of experiment, the cell barcodes with less than 500UMI correspond to the empty beads exposed free RNA during the cell lysis, RNA capture and washing steps. The batch gene background value is defined as the average gene detection for all cellular barcodes with less than 500 UMI, multiplied by a coefficient of 2, and then rounded to the nearest integer. Genes detected in less 25% of all cells are removed from the batch gene background list. We subtract the batch gene background for each cell from the digital expression matrix before making the cross tissue comparison figures.
本数据集为适配论文《通过微孔测序(Microwell-seq)绘制小鼠细胞图谱》(Mapping the Mouse Cell Atlas by Microwell-seq)所用的MCA单细胞数字基因表达谱(Digital Gene Expression, DGE)数据,仅包含唯一分子标识符(Unique Molecular Identifier, UMI)计数大于500的细胞。
MCA_500more_dge.rar:包含超过40万个按组织分类的单细胞的原始数字基因表达矩阵,所有细胞的转录本数量均超过500,未移除批次效应基因。
MCA_BatchRemove_dge.zip:包含超过20万个按组织分类的原代单细胞经批次效应基因移除后的数字基因表达矩阵。由于部分组织的批次效应较强,未被纳入该数据集。该数据集可用于绘制全局组织t分布邻域嵌入(t-distributed Stochastic Neighbor Embedding, tSNE)图及开展跨组织分析。
MCA_CellAssignments.csv:为细胞注释文件,包含细胞名称、聚类ID、所属组织、实验批次及细胞条形码(cell barcode)信息。
MCA_Figure2-batch-removed.txt.tar.gz:为约6万个高质量细胞经批次效应基因移除后的数字基因表达矩阵,分别从43个组织中各取样1500个细胞,该取样数据用于生成论文图2。
MCA_Figure2_Cell.info.xlsx:为图2所用细胞的注释文件:
Sheet1:图2中每个细胞的注释信息,包含细胞名称、聚类ID、所属组织;
Sheet2:图2中98个细胞聚类的注释信息;
Sheet3:98个聚类及43个组织的细胞数量构成情况。
MCA_Batch Information.xlsx:为批次信息文件,包含实验小鼠的年龄、性别以及MCA数据集的实验批次信息。
批次效应移除说明:
为开展跨组织比较分析,我们移除了批次基因背景以优化可视化效果。我们假设,在每一批次实验中,UMI计数低于500的细胞条形码对应细胞裂解、RNA捕获及清洗步骤中暴露于游离RNA的空磁珠。批次基因背景值的计算方式为:将所有UMI计数低于500的细胞条形码的基因检测平均值乘以系数2,随后取整至最接近的整数。在所有细胞中检出率低于25%的基因将被从批次基因背景列表中移除。在绘制跨组织比较可视化图之前,我们将从每个细胞的数字基因表达矩阵中减去该批次基因背景值。
提供机构:
figshare
创建时间:
2018-10-22



