five

Clustering Deviation Index (CDI): A robust and accurate internal measure for evaluating scRNA-seq data clustering

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.08kprr55h
下载链接
链接失效反馈
官方服务:
资源简介:
The clustering of cells has been widely used to explore the heterogeneity of cell populations in single-cell RNA-sequencing (scRNA-seq). We proposed a parametric model for monoclonal and polyclonal scRNA-seq data to evaluate clustering results. Based on the parametric model, we proposed a metric (CDI) to quantify the goodness-of-fit of cell clustering to the data. Here we presented CT26.WT and T-CELL as two datasets to examine the performance of our model and metric. CT26.WT contains wild-type CT26 cells from the murine colorectal carcinoma cell line, and cells in CT26.WT are highly homogeneous. T-CELL contains T-cells from tumor tissue of mice three weeks after 4T1 tumor injection. From these datasets and public datasets, we validated our model and benchmarked our metric. Methods This dataset contains six files. Four of them (matrix.mtx, features.tsv, barcodes.tsv, CT26_bulk_30k.txt) are for CT26.WT, and the other two are for T-CELL. CT26.WT sample preparation: Murine colorectal carcinoma cell line CT26.WT was obtained from the cell culture facility of Duke University and cultured in DMEM media (Sigma Aldrich). All cells were cultured at 37 degrees. Single-cell clones were chosen and cultured for over 220 days. Bulk RNA-seq and single-cell RNA-seq samples were prepared on the same day. CT26.WT bulk RNA-seq: Total RNA from ~ 1,000,000 cells from each group was extracted using the miniprep kit (Zymo Research) according to the manufacturer’s instructions. Then, the libraries were sequenced on the Illumina sequencing platform by the Novogene Corporation Inc. (CA, USA) (HiSeq × Ten) with paired-end 150 bp (PE 150) sequencing strategy. CT26.WT scRNA-seq: A total of ~ 10000 cells of each clone were selected for single-cell RNA-seq. Single-cell RNA sequence libraries using Chromium Single Cell 3’ Reagent kits v3 (10x genomics). The libraries were then sequenced on the Illumina sequencing platform by the Novogene Corporation Inc. (CA, USA) with PE 150 sequencing strategy in a single index mode. T-CELL scRNA-seq: In this study, tumors were firstly collected from the female mice after 3 weeks since the mice were injected by 4T1 tumors. Tissues were then disassociated into single cells and homogenized. T cells were separated out by flow sorting with a stringent gating threshold and sequenced on the 10X platform. T-CELL filtering: We filtered out genes with less than 2% non-zero cells and removed cells with less than 2% non-zero genes. Eventually, 2, 989 cells from five cell types with 7, 893 genes were retained. T-CELL annotation: The benchmark clustering labels of the T-CELL population were generated as a combination of protein-marker-based flow sorting labels and bioinformatics labels from Seurat v2. For evaluation purposes, we selected 5 distinct cell types: Regulatory Trm cells, Classical CD4 Tem cells, CD8 Trm cells, CD8 Tcm cells, and Active EM-like Treg cells.
创建时间:
2022-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作