five

scCompass

收藏
Mendeley Data2024-06-20 更新2024-06-29 收录
下载链接:
https://www.doi.org/10.57760/sciencedb.08736
下载链接
链接失效反馈
官方服务:
资源简介:
We construct a large-scale single-cell pre-training corpus. This corpus consists of more than one hundred million single-cell transcriptomes consisting of 13 different species, humans, mice, monkeys, nematodes, zebrafish, fruit flies, rats, pigs, cows, dogs, horses, chickens, and sheep. Multispecies single-cell data provides a rich resource for understanding cellular heterogeneity across different organisms. However, assembling and preprocessing such data can be challenging due to differences in biological processes and technical variability between species. In this study, we describe the assembly and preprocessing of multispecies single-cell training data from three common model organisms: human, mouse, and monkey. Among the species, the cells of humans and mice have the highest ratio, and each of them consists of 50 million cells. This data is curated from publicly available datasets in the NCBI, CellXgene, EBI, and DDBJ databases. To prepare the multispecies single-cell data for downstream analyses, we performed several preprocessing steps. For quality control, we exclude low-quality and damaged cells, with less than 7 genes for proteins or miRNAs. Then we conduct normalization and log1p transformation to reduce the skewness.
创建时间:
2024-06-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作