Supplementary files for study on modeling DSB with random forests
收藏DataCite Commons2023-04-27 更新2025-04-17 收录
下载链接:
https://datashare.ed.ac.uk/handle/10283/3103
下载链接
链接失效反馈官方服务:
资源简介:
Structural variants (SVs) are known to play important roles in a variety of cancers, but their origins and functional consequences are still poorly understood. The nonrandom distributions of these variants across tumour genomes are often assumed to reflect selective processes, but, as with single nucleotide variants, SV mutation rates often reflect the underlying chromatin and other features at a locus. Inferring which SVs may be under selection in tumourigenesis therefore remains challenging, though identifying such variants may lead to new diagnostic and therapeutic targets. Many SVs are thought to emerge via errors in the repair processes following DNA double strand breaks (DSBs) and a variety of studies have experimentally measured DSB frequencies across the genome in cell lines. Using these data we derive the first quantitative genome-wide models of DSB susceptibility, based upon underlying chromatin and sequence features. These models provide high predictive accuracy and novel insights into the mutational mechanisms generating DSBs. Models trained in one cell type can be successfully applied to others, but a substantial proportion of DSBs appear to reflect cell type specific processes. We also show that regions harboring unusually high tumour SV breakpoint frequencies occur within well modeled regions of the genome but often display DSB frequencies inconsistent with DSB model predictions. Using model predictions as a proxy for susceptibility to DSBs in tumours, many SV hotspots appear to be poorly explained by selectively neutral mutational bias alone. A substantial number of hotspots show unexpectedly high SV breakpoint frequencies given their predicted susceptibility to mutation, and are therefore credible targets of positive selection in tumours. These putatively positively selected hotspots are enriched for genes previously shown to be oncogenic. In contrast, several hundred regions across the genome show unexpectedly low levels of SVs, given their relatively high susceptibility to mutation. These novel ‘coldspot’ regions appear to be subject to purifying selection in tumours and are enriched for active promoters and enhancers. We conclude that models of DSB susceptibility offer a rigorous approach to the inference of SVs putatively subject to selection in tumours.
结构变异(Structural variants, SVs)已知在多种癌症中发挥重要作用,但其起源与功能后果仍知之甚少。这些变异在肿瘤基因组中的非随机分布通常被认为反映了选择过程,但与单核苷酸变异类似,SV的突变率往往反映了基因座处的潜在染色质及其他特征。因此,推断肿瘤发生过程中哪些SV可能受到选择仍具挑战性,尽管识别此类变异或可带来新的诊断与治疗靶点。许多SV被认为是DNA双链断裂(DNA double strand breaks, DSBs)后修复过程中的错误所致,且多项研究已在细胞系中通过实验测量了全基因组范围内的DSB频率。利用这些数据,我们基于潜在染色质及序列特征,构建了首个全基因组DSB易感性定量模型。这些模型具有较高的预测准确性,并为理解产生DSB的突变机制提供了新见解。在一种细胞类型中训练的模型可成功应用于其他细胞类型,但相当一部分DSB似乎反映了细胞类型特异性过程。我们还发现,肿瘤SV断点频率异常高的区域位于基因组中模型拟合良好的区域内,但往往表现出与DSB模型预测不一致的DSB频率。将模型预测作为肿瘤中DSB易感性的代理指标时,许多SV热点区域似乎无法仅通过选择性中性突变偏倚得到充分解释。相当数量的热点区域在其预测的突变易感性下表现出意外高的SV断点频率,因此是肿瘤中正向选择的可信靶点。这些推测受到正向选择的热点区域富集了先前被证明具有致癌性的基因。相反,基因组中数百个区域在其相对较高的突变易感性下表现出意外低的SV水平。这些新发现的‘冷点’区域似乎在肿瘤中受到纯化选择,并富集了活性启动子与增强子。我们得出结论,DSB易感性模型为推断肿瘤中推测受到选择的SV提供了一种严谨方法。
提供机构:
University of Edinburgh. Institute of Genetics and Molecular Medicine
创建时间:
2018-06-19



