Source Data and Simulated Datasets for Sant et al. 2025 - CHOIR improves significance-based detection of cell types and states from single-cell data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14641221
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains files related to Sant et al. Nature Genetics 2025 which were too large to include as part of the publication. Below, we describe each file and its contents.1. Simulated datasets and associated parameters
Simulated_Data_Parameters.xlsx - This file contains the parameters used to create the simulated datasets mentioned below. Briefly, using the R package Splatter, we generated 100 simulated datasets representing 1, 5, 10, or 20 distinct ground-truth cell populations, ranging from 500 to 25,000 cells. To assess how various aspects of snRNA-seq datasets affect CHOIR’s performance, we used five of the simulated datasets produced with Splatter as the baseline to generate 105 additional simulated datasets in which we incrementally reduced the prevalence of rare cell populations, the degree of differential expression, or the library size. Additionally, we generated 10 simulated datasets with multiple batches, with either balanced or imbalanced batch sizes, and 5 simulated datasets using Splatter’s simulation of cell differentiation trajectories. To ensure that our results were not dependent on the software used for data simulation, we also generated 10 datasets with the simulation method scDesign3 from real subsampled PBMC cell populations.Simulated_Datasets.tar.gz - This tar.gz archive contains the 230 simulated datasets which were used for benchmarking of clustering tools for single-cell analysis in Sant et al. Nature Genetics 2025. The individual datasets have been stored as Seurat objects and combined into a single tar.gz file.
2. Source data and results for real-world datasets
SourceData1_RealData.xlsx - This excel file contains the parameters used, the metrics obtained, the cell labels obtained, and any relevant single-cell-resolution results from the analyses of the following real-world datasets: snMultiome human retina (Wang et al. Cell Genomics 2022), atlas-scale snRNA-seq of human brain (Siletti et al. Science 2023), scRNA-seq of mixed cell lines (Kinker et al. Nature Genetics 2020), CITE-seq of human PBMCs (Hao et al. Cell 2021), and sci-Space of mouse embryo (Srivatsan et al. Science 2021).
3. Source data and results for simulated datasets
SourceData2_SimulatedData.xlsx - This excel file contains the parameters used, the metrics obtained, and the cell labels obtained for all simulated datasets analyzed in Sant et al. Nature Genetics 2025.
创建时间:
2025-03-15



