five

DeepC: Predicting chromatin interactions using megabase scaled deep neural networks and transfer learning (NG Capture-C)

收藏
NIAID Data Ecosystem2026-04-25 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP221613
下载链接
链接失效反馈
官方服务:
资源简介:
Understanding 3D genome structure requires high throughput, genome-wide approaches. However, assays for all vs. all chromatin interaction mapping are expensive and time consuming, which severely restricts their usage for large-scale mutagenesis screens or for mapping the impact of sequence variants. Computational models sophisticated enough to grasp the determinants of chromatin folding provide a unique window into the functional determinants of 3D genome structure as well as the effects of genome variation. A chromatin interaction predictor should work at the base pair level but also incorporate large-scale genomic context to simultaneously capture the large scale and intricate structures of chromatin architecture. Similarly, to be a flexible and generalisable approach it should also be applicable to data it has not been explicitly trained on. To develop a model with these properties, we designed a deep neuronal network (deepC) that utilizes transfer learning to accurately predict chromatin interactions from DNA sequence at megabase scale. The model generalizes well to unseen chromosomes and works across cell types, Hi-C data resolutions and a range of sequencing depths. DeepC integrates DNA sequence context on an unprecedented scale, bridging the different levels of resolution from base pairs to TADs. We demonstrate how this model allows us to investigate sequence determinants of chromatin folding at genome-wide scale and to predict the importance of regulatory elements and the impact of sequence variations. Overall design: To validate in silico predictions of chromatin interactions at high resolution and scale, we performed NG Capture-C (Davies 2016) from 220 viewpoints in two cell lines (K562 (WIMM transgenics facility) and GM12878 - LCLs (Coriell)), from which predicted chromatin interactions have been generated. These viewpoints comprise 81 CTCF sites and 139 intra domain viewpoints designed to avoid active element overlap. Library preparation and NG Capture-C was performed in biological triplicates with four unique adapters being used for each replicate to increase sequencing depth and minimize PCR duplicates. These were pooled for maximum resolution. Capture was performed with biotinylated oligonucleotides targeting sequences adjacent to DpnII sites at the viewpoints of interest.

解析三维基因组结构需要高通量、全基因组范围的研究方法。然而,全对全染色质相互作用图谱绘制实验成本高昂且耗时极久,严重限制了其在大规模诱变筛选或序列变异影响定位中的应用。足够精密的、能够解析染色质折叠决定因素的计算模型,为探究三维基因组结构的功能决定因素以及基因组变异的效应提供了独特视角。染色质相互作用预测模型不仅应能在碱基对水平上开展工作,还需整合大规模基因组背景,以同时捕捉染色质架构的宏观尺度与精细复杂结构。同理,作为一种灵活且可泛化的方法,该模型还需能够适用于未经过显式训练的数据集。 为开发具备上述特性的模型,我们设计了一款深度神经网络(deepC),该模型借助迁移学习,可从兆碱基尺度的DNA序列中精准预测染色质相互作用。该模型可良好泛化至未在训练集中出现的染色体,且适用于不同细胞类型、Hi-C数据分辨率以及多种测序深度场景。DeepC以前所未有的规模整合DNA序列背景,打通了从碱基对到拓扑关联结构域(Topologically Associating Domains, TADs)的各级分辨率尺度。我们通过该模型展示了如何在全基因组尺度下探究染色质折叠的序列决定因素,并预测调控元件的重要性以及序列变异的影响。 实验整体设计:为在高分辨率与大规模尺度下验证染色质相互作用的虚拟预测结果,我们在两种细胞系——K562细胞(WIMM转基因设施)与GM12878淋巴母细胞样细胞系(Coriell细胞库)——中,针对220个锚定位点开展了NG Capture-C实验(Davies等,2016),并由此生成了预测的染色质相互作用数据。这些锚定位点包含81个CCCTC结合因子(CCCTC-binding factor, CTCF)结合位点以及139个域内锚定位点,设计时规避了与活性元件重叠的情况。文库制备与NG Capture-C实验均设置三次生物学重复,每个重复使用四种独特的测序接头,以提升测序深度并尽可能减少PCR重复序列的产生。将所有重复样本合并以实现最高分辨率。捕获实验采用生物素标记的寡核苷酸探针,靶向目标锚定位点附近的DpnII酶切位点序列。
创建时间:
2020-10-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作