Topological Identification and Interpretation for Single-cell Epigenetic Regulation Elucidation in Multi-tasks using scAGDE
收藏DataCite Commons2025-04-01 更新2025-09-08 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Topological_Identification_and_Interpretation_for_Single-cell_Epigenetic_Regulation_Elucidation_in_Multi-tasks_using_scAGDE/26067511/1
下载链接
链接失效反馈官方服务:
资源简介:
We generated a total of 17 simulated datasets from bulk ATAC-seq data of bone marrow, which contains six FACS-sorted cell populations. Following a previously published benchmarking framework for scATAC-seq tools, we set the parameter n, which determines fragment counts within a single cell, at 250, 500, 1500, 2500, and 5000, respectively, thus obtaining five datasets of varying sequencing depth. We set the parameter q, which controls the proportion of cell-specific reads at 0, 0.1, 0.2, 0.3, 0.4, thus obtaining five datasets of differing noise levels. Lastly, we randomly dropped valid reads at rates ranging from 10% to 70%, generating seven datasets with a varying degree of dropout.
Additionally, we collected 11 publicly available scATAC-seq datasets with given cell type labels for benchmarking to validate the effectiveness of scAGDE. These datasets, generated from different platforms and including human and mouse samples, vary in sparsity and scalability. Four datasets annotated through computational approaches included ‘Forebrain’ (GSE100033),‘Splenocyte’ (E-MTAB-6714), ‘GM12878vsHEK’ (GSE65360), ‘GM12878vsHL’ (GSE149683), ‘Lung’ (GSE149683) and‘Liver’ (GSE65360). Three datasets containing FACS-sorted cell populations were ‘Blood2K’ (GSE96772), ‘10XBlood’(GSE129785), and ‘DropBlood’ (GSE123581). The remaining two datasets were ‘Leukemia’ (GSE74310), which mixes cells from a healthy donor with leukemia cells from two acute myeloid leukemia (AML) patients, and ‘InSilico’ (GSE65360) combining six individual scATAC-seq data from distinct cell lines. The human fetal atlas dataset from Domcke et al., can be obtained from the public resource at https://descartes.brotmanbaty.org/bbi/human-chromatin-during-development/. The human brain dataset, downloadable from GSE184462, comes from a single-cell atlas of scATAC-seq of the human genome. The reference single-cell RNA-seq dataset from brain tissue used in our study can be found at GSE207334 and utilizes data from human samples. All processed datasets for benchmarking analysis and the human brain dataset have been deposited in the Zenodo database at https://doi.org/10.5281/zenodo.11609252.
提供机构:
figshare
创建时间:
2025-02-17



