Data archive: CICT for single cell RNA-seq network inference
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8025871
下载链接
链接失效反馈官方服务:
资源简介:
This archive contains benchmarking input data and results for using single cell gene expression data to infer gene regulatory networks (GRN) by the Causal Inference with Composition of Transactions (CICT) method and a selected set of published methods. This accompanies the manuscript "Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions" (Shojaee and Huang, Brief in Bioinform 2023. DOI: 10.1093/bib/bbad370). The CICT code is available at the GitHub repo (https://github.com/hlab1/scRNAseqWithCICT/).
The original CICT algorithm was described in Shojaee et al. (arXiv:1608.02658, 2016). The benchmarked methods were included in the BEELINE benchmarking pipeline (Pratapa et al., Nat Methods 2020), to which we added DEEPDRIM (Chen et al., Brief Bioinform 2021), SCENIC (Aibar et al., Nat Methods 2017), Inferelator 3.0 (Gibbs et al., Bioinformatics 2022), and CellOracle (Kamimoto et al., Nature 2023). The output directory names are (subdirectories within each dataset):
* CICT_ewMIshrink_RFmaxdepth10_RFntrees20/: CICT for simulated data* CICT_v2/: CICT for experimental data* CELLORACLEDB/: CellOracle for experimental data* DEEPDRIM72_ewMIshrink_RFmaxdepth10_RFntrees20/: DEEPDRIM for simulated data* DEEPDRIM72_v2/: DEEPDRIM for experimental data* INFERELATOR38_ewMIshrink_RFmaxdepth10_RFntrees20/: Inferelator-Prior for simulated data* INFERELATOR38_v2/: Inferelator-Prior for experimental data* INFERELATOR34_ewMIshrink_RFmaxdepth10_RFntrees20/: Inferelator-NoPrior for experimental data* INFERELATOR34_v2/: Inferelator-NoPrior for experimental data* GENIE3/: GENIE3* GRNBOOST2/: GRNBOST2* LEAP/: LEAP* PIDC/: PIDC* PPCOR/: PPCOR* SCENICDB/: SCENIC for experimental data* SCNS/: SCNS* SCODE/: SCODE* SCRIBE/: SCRIBE* SINCERITIES/: SINCERITIES* SINGE/: SINGE* RANDOM/: RANDOM
The methods were benchmarked against two kinds of scRNA-seq datasets:* Simulated datasets produced by the SERGIO simulator from a synthetic network (Dibaeinia et al., Cell Systems 2020), including complete datasets and datasets with dropouts with shape parameter k=6.5 and rate parameter q=10, 30, 50, 70, 80. * Experimental datasets compiled by the BEELINE pipeline, evaluated at three different levels L0, L1 and L2, with three types of ground truth networks. * Evaluation levels: * L0: 500 highly varying genes plus TFs * L1: 1000 highly varying genes plus TFs * L2: 500 highly varying genes, TFs and 500 genes randomly selected that excluded the 1000 highly varying genes from L1. * Types of ground truths: * Cell-type-specific ChIP-seq ground truth (L0, L1, L2) * Non-specific ChIP-seq ground truth (L0_ns, L1_ns, L2_ns) * Loss-of-function/gain-of-function ground truth (L0_lofgof, L1_lofgof, L2_lofgof)
The directory structure is organized in accordance with the BEELINE benchmarking pipeline. For complete details please please see the BEELINE documentation (https://murali-group.github.io/Beeline/) and Github repo (https://github.com/Murali-group/Beeline).
本存档包含用于基于单细胞基因表达数据,通过交易组合因果推断(Causal Inference with Composition of Transactions, CICT)方法与一组精选已发表方法推断基因调控网络(GRN)的基准测试输入数据与结果。本数据集配套论文《基于交易组合因果推断从单细胞基因表达数据中稳健发掘基因调控网络》(Shojaee与Huang,Brief in Bioinform 2023. DOI: 10.1093/bib/bbad370)。CICT代码可于GitHub仓库(https://github.com/hlab1/scRNAseqWithCICT/)获取。
原始CICT算法的相关描述见Shojaee等人(arXiv:1608.02658,2016)。本次基准测试所纳入的方法均来自BEELINE基准测试流程(Pratapa等人,Nat Methods 2020),我们额外添加了DEEPDRIM(Chen等人,Brief Bioinform 2021)、SCENIC(Aibar等人,Nat Methods 2017)、Inferelator 3.0(Gibbs等人,Bioinformatics 2022)与CellOracle(Kamimoto等人,Nature 2023)。各数据集内的子目录命名规则如下:
* CICT_ewMIshrink_RFmaxdepth10_RFntrees20/:用于模拟数据的CICT方法
* CICT_v2/:用于实验数据的CICT方法
* CELLORACLEDB/:用于实验数据的CellOracle方法
* DEEPDRIM72_ewMIshrink_RFmaxdepth10_RFntrees20/:用于模拟数据的DEEPDRIM方法
* DEEPDRIM72_v2/:用于实验数据的DEEPDRIM方法
* INFERELATOR38_ewMIshrink_RFmaxdepth10_RFntrees20/:用于模拟数据的Inferelator-Prior方法
* INFERELATOR38_v2/:用于实验数据的Inferelator-Prior方法
* INFERELATOR34_ewMIshrink_RFmaxdepth10_RFntrees20/:用于实验数据的Inferelator-NoPrior方法
* INFERELATOR34_v2/:用于实验数据的Inferelator-NoPrior方法
* GENIE3/:GENIE3方法
* GRNBOOST2/:GRNBOOST2方法
* LEAP/:LEAP方法
* PIDC/:PIDC方法
* PPCOR/:PPCOR方法
* SCENICDB/:用于实验数据的SCENIC方法
* SCNS/:SCNS方法
* SCODE/:SCODE方法
* SCRIBE/:SCRIBE方法
* SINCERITIES/:SINCERITIES方法
* SINGE/:SINGE方法
* RANDOM/:RANDOM方法
本次基准测试针对两类单细胞RNA测序(single-cell RNA sequencing, scRNA-seq)数据集展开:
1. 由SERGIO模拟器基于人工合成网络生成的模拟数据集(Dibaeinia等人,Cell Systems 2020),包含完整数据集以及带有测序丢失(dropout)的数据集,其中测序丢失的形状参数k=6.5,速率参数q分别为10、30、50、70、80。
2. 由BEELINE流程整合的实验数据集,在三个不同水平L0、L1、L2下进行评估,涵盖三类真实网络真值。
- 评估水平:
- L0:包含500个高变异基因与转录因子(Transcription Factors, TFs)
- L1:包含1000个高变异基因与转录因子
- L2:包含500个高变异基因、转录因子,以及从L1的1000个高变异基因以外随机选取的500个基因
- 真实真值类型:
- 细胞类型特异性ChIP-seq真实真值(L0、L1、L2)
- 非特异性ChIP-seq真实真值(L0_ns、L1_ns、L2_ns)
- 功能丧失/功能获得真实真值(L0_lofgof、L1_lofgof、L2_lofgof)
本数据集的目录结构遵循BEELINE基准测试流程组织。如需完整细节,请参阅BEELINE官方文档(https://murali-group.github.io/Beeline/)与GitHub仓库(https://github.com/Murali-group/Beeline)。
创建时间:
2023-10-31



