five

MCA DGE Data

收藏
DataCite Commons2020-09-01 更新2024-08-17 收录
下载链接:
https://figshare.com/articles/dataset/MCA_DGE_Data/5435866/6
下载链接
链接失效反馈
官方服务:
资源简介:
MCA single cell DGE data (Cells with &gt;500UMI ) for the following manuscript:Mapping the Mouse Cell Atlas by Microwell-seq<br>MCA_500more_dge.rar: The raw digital expression matrix (dge) of more than 400,000 single cells sorted by tissues. All cells have more than 500 transcripts. The batch genes were not removed.<br>MCA_BatchRemove_dge.zip: The batch gene removed dge of more than 400,000 single cells sorted by tissues. This dataset can be used to make global tissue tSNE plot and do cross-tissue analysis.<br>MCA_CellAssignments.csv: The annotation of cells, which includes the cell names, cluster ID, belonged tissues, experimental batches and cell barcodes.<br>MCA_Figure2-batch-removed.txt.tar.gz: The batch removed dge of approximately 60,000 cells of high quality. 1500 cells were sampled from 43 tissues respectively. This sampled data is used for Figure 2.<br>MCA_Figure2_Cell.info.xlsx: The annotations of cells used in Figure2. <br>Sheet1: The annotations of each cell used in Figure2, including cell names, cluster ID, belonged tissues. <br>Sheet2: The annotations of 98 clusters in Figure2. <br>Sheet3: The composition of cell numbers in 98 clusters and 43 tissues. <br>MCA_Batch Information.xlsx: The batch information, which includes the age and gender of the mouse, and experiment batches for MCA data.<br><br>Batch effect removalFor cross tissue comparison, we removed the batch gene background to improve presentation. We assume that for each batch of experiment, the cell barcodes with less than 500UMI correspond to the empty beads exposed free RNA during the cell lysis, RNA capture and washing steps. The batch gene background value is defined as the average gene detection for all cellular barcodes with less than 500 UMI, multiplied by a coefficient of 2, and then rounded to the nearest integer. Genes detected in less 25% of all cells are removed from the batch gene background list. We subtract the batch gene background for each cell from the digital expression matrix before making the cross tissue comparison figures.

MCA单细胞数字基因表达(Digital Gene Expression, DGE)数据(唯一分子标识符(Unique Molecular Identifier, UMI)计数>500的细胞),对应已发表论文:《基于Microwell-seq技术绘制小鼠细胞图谱》(Mapping the Mouse Cell Atlas by Microwell-seq)<br>MCA_500more_dge.rar:包含40余万按组织分类的单细胞原始数字表达矩阵,所有细胞的转录本计数均超过500,未去除批次效应基因。<br>MCA_BatchRemove_dge.zip:包含40余万按组织分类的单细胞已去除批次效应基因的数字表达矩阵,该数据集可用于绘制全局组织t分布邻域嵌入(t-distributed Stochastic Neighbor Embedding, tSNE)图及开展跨组织分析。<br>MCA_CellAssignments.csv:细胞注释文件,包含细胞名称、聚类ID、所属组织、实验批次及细胞条形码(cell barcodes)。<br>MCA_Figure2-batch-removed.txt.tar.gz:包含约6万个高质量细胞的已去除批次效应的数字表达矩阵,从43个组织中各采样1500个细胞,该采样数据集用于生成图2。<br>MCA_Figure2_Cell.info.xlsx:图2所用细胞的注释文件。<br>Sheet1:图2中每个细胞的注释信息,包含细胞名称、聚类ID、所属组织。<br>Sheet2:图2中98个聚类的注释信息。<br>Sheet3:98个聚类及43个组织的细胞数量构成情况。<br>MCA_Batch Information.xlsx:批次信息文件,包含实验所用小鼠的年龄、性别及MCA数据的实验批次。<br><br>批次效应去除<br>为开展跨组织比较,我们通过去除批次基因背景以优化可视化效果。我们假设:在每一轮实验中,UMI计数低于500的细胞条形码对应细胞裂解、RNA捕获及清洗步骤中暴露于游离RNA的空磁珠。批次基因背景值定义为:所有UMI计数低于500的细胞条形码的基因检测平均值乘以系数2,随后取整。在所有细胞中检出率低于25%的基因将被从批次基因背景列表中移除。在绘制跨组织比较图前,我们从数字表达矩阵中减去每个细胞对应的批次基因背景值。
提供机构:
figshare
创建时间:
2018-09-27
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
MCA DGE Data是一个包含小鼠单细胞数字基因表达数据的数据集,涵盖原始表达矩阵和批次效应去除后的数据,适用于跨组织分析和可视化。数据集支持《Mapping the Mouse Cell Atlas by Microwell-seq》研究,包含超过40万个单细胞的详细信息。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作