scCompass
收藏Mendeley Data2024-06-20 更新2024-06-29 收录
下载链接:
https://www.doi.org/10.57760/sciencedb.08736
下载链接
链接失效反馈官方服务:
资源简介:
We construct a large-scale single-cell pre-training corpus. This corpus consists of more than one hundred million single-cell transcriptomes consisting of 13 different species, humans, mice, monkeys, nematodes, zebrafish, fruit flies, rats, pigs, cows, dogs, horses, chickens, and sheep. Multispecies single-cell data provides a rich resource for understanding cellular heterogeneity across different organisms. However, assembling and preprocessing such data can be challenging due to differences in biological processes and technical variability between species. In this study, we describe the assembly and preprocessing of multispecies single-cell training data from three common model organisms: human, mouse, and monkey. Among the species, the cells of humans and mice have the highest ratio, and each of them consists of 50 million cells. This data is curated from publicly available datasets in the NCBI, CellXgene, EBI, and DDBJ databases. To prepare the multispecies single-cell data for downstream analyses, we performed several preprocessing steps. For quality control, we exclude low-quality and damaged cells, with less than 7 genes for proteins or miRNAs. Then we conduct normalization and log1p transformation to reduce the skewness.
创建时间:
2024-06-20



