Datasets for predicting TF binding using Virtual ChIP-seq
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/823296
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains datasets necessary for using the Virtual ChIP-seq software.
Virtual ChIP-seq requires the following datasets to predict transcription factor binding:
chipExpDir_AtoH_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters A-H.
chipExpDir_ItoZ_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters I-Z.
refTables_V1.1.0.tar.gz: PhastCons genomic conservation, FIMO PWM scores for JASPAR motifs, and ChIP-seq data of ENCODE and Cistrome database.
hg38_chrsize.tsv: Length of chromosomes in hg38
trainedModels_V1.0.0.tar.gz: Virtual ChIP-seq scikit-learn trained models saved in joblib format
.tar.gz: Pre-calculated matrices suitable for training with other algorithms or re-training with Virtual ChIP-seq.
Some predictive features of TF binding are the same in each cell type and are stored together for simplicity in refTables_V1.0.0.tar.gz. You can use datasets from other cell types (named here as .tar.gz) for the purpose of re-training the model. The .tar.gz files contain pre-calculated predictive features of transcription factor binding in 4 chromosomes (5, 10, 15, 20).
These features include:
PhastCons genomic conservation
FIMO score for sequence motifs of TF in the JASPAR database
Chromatin accessibility
TF binding in ENCODE + Cistrome DB datasets
Virtual ChIP-seq expression score
创建时间:
2020-01-24



