five

maxATAC Data

收藏
Zenodo2022-06-27 更新2026-04-07 收录
下载链接:
https://zenodo.org/record/6761767
下载链接
链接失效反馈
官方服务:
资源简介:
<strong>Abstract</strong> Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM <em>in vivo</em> Transcription-Factor Binding Site (<strong>TFBS</strong>) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built “<strong>maxATAC</strong>”, a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the first collection of high-performance TFBS prediction models for ATAC-seq. <strong>Repository Overview</strong> This repository contains all of the processed training data used by maxATAC for model training and benchmarking. All directories have the extension .tar.gz . In this repository you will find the directories: <pre><strong>ATAC_Peaks:</strong> ATAC-seq peak files called with MACS2. These files are generated for the hg38 reference genome. The files are have the extension .bed.gz. <strong>ATAC_Signal_File:</strong> ATAC-seq signal file. This file has been read-depth normalized and min-max normalized between 0,1 using the 99th percentile max value. These files are presented as bigwig files with a .bw extension. <strong>ChIP_Binding_File:</strong> ChIP-seq signal tracks. These files are the binary signal tracks in bigwig format that are found in the ChIP_Peaks directory. <strong>ChIP_Peaks:</strong> ChIP-seq peaks files. This directory contains the ENCODE IDR peak sets and peak sets created in the maxATAC publication. These files have the extension .bed.gz. <strong>Full_Models:</strong> Current set of 127 maxATAC TF models. This directory includes the information for thresholding and the .h5 model files. <strong>hg38:</strong> This directory includes the hg38 reference genome information that was used in this publication. <strong>Prediction_and_Benchmarking:</strong> This directory contains all of the predictions for chr1 used for benchmarking in a round-robin training approach. <strong>Tn5_CutSites:</strong> This directory contains the Tn5 cut sites that have been shifted +4 on the (+) strand and -5 on the (-) strand. The cut sites were then slopped 20 bp using bedtools slop. These files are presented as bed files that have been bzipped. Each file represents an individual biological replicate. <strong>scATAC:</strong> This directory includes data used for scATAC-seq based predictions. </pre> For additional details please see the maxATAC GitHub Repository and bioRxiv pre-print.
提供机构:
University of Cincinnati; Sreeja Parameswaran; Cincinnati Children's Hospital Medical Center; Balaji Iyer
创建时间:
2022-06-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作