five

PMD hypomethylation human (hg19) neural network scores

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6477287
下载链接
链接失效反馈
官方服务:
资源简介:
Global loss of DNA methylation in mammalian genomes occurs cumulatively as a mitotic process during aging and cancer, primarily in Partially Methylated Domains (PMDs). It has been shown that local sequence context (100bp) has a strong effect on the rate of demethylation of individual CpG dinucleotides within PMDs. Here, we train a deep learning model to characterize this sequence dependence further, finding that methylation loss can be predicted from a CpG’s 150bp sequence context alone with an AUC of 0.95. We use re-methylation rates of newly synthesized DNA to show that CpGs with fast-loss sequence context are inefficiently re-methylated. Interestingly, we find that the 10% of CpGs predicted to have the “slowest” rate of loss lose almost no DNA methylation in healthy cell types. These same slow-loss CpGs lose a significant amount of DNA methylation in cancer, suggesting that they could be responsible for deregulation of genes and transposable elements that are associated with DNA hypomethylation in cancer. This directory contains the Sep. 20, 2020 version of the human (hg19) CpG hypomethylation Neural network scores in one gzip-compressed tsv file per chromosome. The Sep. 2020 Neural network score provides a prediction of the probability of each sequence to be a fast hypomethylation CpG, which was produced by a neural network model that used two independent input training datasets. Files included in this directory:   - chr*. tsv.gz: Neural network score of each CpG in each chromosome, using hg19 coordinates. chrX and chrY are omitted.   Each row is a CG which provides (1) chromosome, (2) the corresponding C coordinate on the forward (watson) strand of the reference genome in one-based coordinates, (3) Neural network score, (4) number of CpGs within the 150bp sequence centered on this CpG, including the center CpG, (5) CpG is within a CpG island (0, no; 1, yes), CpG is within ENCODE blacklist (0, no; 1, yes)  Here the CpG islands are the union set of Irizarry (Irizarry et al. 2009, Nat Genet), Takai-Jones (Takai et al. 2002, PNAS), Gardner-Gardin CGIs (Gardner-Gardin et al. 1987, J Mol Biol.). The blacklist was downloaded from https://github.com/Boyle-Lab/Blacklist/tree/master/lists.
创建时间:
2022-06-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作