scCompass

Mendeley Data2024-06-20 更新2024-06-29 收录

下载链接：

https://www.doi.org/10.57760/sciencedb.08736

下载链接

链接失效反馈

官方服务：

资源简介：

We construct a large-scale single-cell pre-training corpus. This corpus consists of more than one hundred million single-cell transcriptomes consisting of 13 different species, humans, mice, monkeys, nematodes, zebrafish, fruit flies, rats, pigs, cows, dogs, horses, chickens, and sheep. Multispecies single-cell data provides a rich resource for understanding cellular heterogeneity across different organisms. However, assembling and preprocessing such data can be challenging due to differences in biological processes and technical variability between species. In this study, we describe the assembly and preprocessing of multispecies single-cell training data from three common model organisms: human, mouse, and monkey. Among the species, the cells of humans and mice have the highest ratio, and each of them consists of 50 million cells. This data is curated from publicly available datasets in the NCBI, CellXgene, EBI, and DDBJ databases. To prepare the multispecies single-cell data for downstream analyses, we performed several preprocessing steps. For quality control, we exclude low-quality and damaged cells, with less than 7 genes for proteins or miRNAs. Then we conduct normalization and log1p transformation to reduce the skewness.

创建时间：

2024-06-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集