five

Whole Genome Bisulfite sequencing: Allele-specific DNA methylation is increased in cancers and its dense mapping in normal plus neoplastic cells increases the yield of disease-associated regulatory SNPs. Whole Genome Bisulfite sequencing: Allele-specific DNA methylation is increased in cancers and its dense mapping in normal plus neoplastic cells increases the yield of disease-associated regulatory SNPs

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA574550
下载链接
链接失效反馈
官方服务:
资源简介:
Background: Mapping of allele-specific DNA methylation (ASM) can be a post-GWAS strategy for localizing regulatory sequence polymorphisms (rSNPs). However, the advantages of this approach, and the mechanisms underlying ASM in normal and neoplastic cells, remain to be clarified. Results: We performed whole genome methyl-seq on diverse normal cells and tissues and three types of cancers (multiple myeloma, lymphoma, glioblastoma multiforme). After excluding imprinting, the data pinpointed 15,114 high-confidence ASM differentially methylated regions (DMRs), of which 1,842 contained SNPs in strong linkage disequilibrium or coinciding with GWAS peaks. ASM frequencies were increased 5 to 9-fold in cancers vs. matched normal tissues, due to widespread allele-specific hypomethylation and focal allele-specific hypermethylation in poised chromatin. Cancers showed increased allele switching at ASM loci, but destructive SNPs in specific classes of CTCF and transcription factor (TF) binding motifs were similarly correlated with ASM in cancer and non-cancer. Rare somatic mutations in these same motif classes tracked with de novo ASM in the cancers. Allele-specific TF binding from ChIP-seq was enriched among ASM loci, but most ASM DMRs lacked such annotations, and some were found in otherwise uninformative “chromatin deserts”. Conclusions: ASM is increased in cancers but occurs by a shared mechanism involving rSNPs in CTCF and TF binding sites in normal and neoplastic cells. Dense ASM mapping in normal plus cancer samples reveals candidate rSNPs that are difficult to find by other approaches. Together with GWAS data, these rSNPs can nominate specific transcriptional pathways in susceptibility to autoimmune, neuropsychiatric, and neoplastic diseases. Overall design: For analyzing complete methylomes in 65 primary non-neoplastic and 16 primary neoplastic samples, plus the GM12878 LCL, WGBS was performed at the New York Genome Center (NYGC), MNG Genetics (MNG) and the Genomics Shared Resource of the Roswell Park Cancer Institute (RPCI). The NYGC used a modified Nextera transposase-based library approach. Briefly, genomic DNA was first tagmented using Nextera XT transposome and end repair was performed using 5mC. After bisulfite conversion, Illumina adapters and custom bisulfite converted adapters are attached by limited cycle PCR. Two separate libraries were prepared and pooled for each sample to limit the duplication rate and sequenced using Illumina X system (150 bp paired-end). WGBS performed at MNG used the Illumina TruSeq DNA Methylation Kit for library construction according to the manufacturer’s instructions and generated 150 bp paired end reads on an Illumina NovaSeq machine. WGBS performed at RPCI utilized the ACCEL-NGS Methyl-Seq DNA Library kit for library construction (Swift Biosciences) and generated 150 bp paired end reads on an Illumina NovaSeq. After trimming for low-quality bases (Phred score0.05 and SNPs that deviated significantly from Hardy-Weinberg equilibrium based on exact tests corrected for multiple tests (FDR<0.05 by HardyWeinberg R package). C/T and G/A SNPs were assessed after filtering out reads mapping to the C/T strand. ASM calling was performed after separating the SNP-containing reads by allele. After Bismark methylation extractor is applied, CpG methylation calls by allele are retrieved using allele tagged read IDs. Paired reads with ambiguous SNP calling (i.e., called as REF allele on one paired end and ALT allele on the other) were discarded. For Nextera WGBS, due to the fill-in reaction using 5mC following DNA tagmentation which affects the 10 first base pairs (bp) on 5’ of read 2, methylation calling for Cs mapping to these bp were not considered. In addition, a slight methylation bias due to random priming and specific to each library kit was observed within the last 2 bp on 3’ of both paired ends for Nextera WGBS, within the first 10 bp on 5’ of both paired ends and the last 2 bp on 3’ of read 2 for TruSeq WGBS, and within the first 10 bp on 5’ of read 2 for ACCEL-NGS WGBS. Therefore, methylation calls in these windows were ignored.

研究背景:等位基因特异性DNA甲基化(allele-specific DNA methylation, ASM)图谱绘制可作为定位调控序列多态性(regulatory sequence polymorphisms, rSNPs)的全基因组关联分析(Genome-Wide Association Study, GWAS)后分析策略。然而,该方法的应用优势,以及正常细胞与肿瘤细胞中ASM的潜在作用机制,仍有待阐明。 研究结果:我们对多种正常细胞、组织及3种癌症(多发性骨髓瘤、淋巴瘤、多形性胶质母细胞瘤)开展全基因组甲基化测序。在排除印记区域后,本研究共鉴定得到15114个高置信度ASM差异甲基化区域(differentially methylated regions, DMRs),其中1842个区域携带强连锁不平衡的单核苷酸多态性(single nucleotide polymorphism, SNP),或与GWAS显著位点重合。相较于匹配的正常组织,癌症组织中ASM频率升高5至9倍,这源于待命染色质(poised chromatin)中广泛存在的等位基因特异性低甲基化,以及局灶性等位基因特异性高甲基化。癌症组织中ASM位点的等位基因切换频率升高,但特定类别CCCTC结合因子(CCCTC-binding factor, CTCF)与转录因子(transcription factor, TF)结合基序的破坏性SNP,在癌症与非癌样本中均与ASM呈显著相关。上述结合基序类别中罕见的体细胞突变,与癌症中从头ASM(de novo ASM)的发生密切相关。基于染色质免疫共沉淀测序(chromatin immunoprecipitation sequencing, ChIP-seq)得到的等位基因特异性TF结合信号在ASM位点中显著富集,但绝大多数ASM DMRs并无此类注释,且部分ASM DMRs位于原本无功能注释的“染色质荒漠”(chromatin deserts)中。 研究结论:癌症组织中ASM水平升高,但其发生依赖于正常与肿瘤细胞中CTCF及TF结合位点内的rSNPs所介导的共同机制。对正常与癌症样本开展高密度ASM图谱绘制,可鉴定得到其他方法难以发现的候选rSNPs。结合GWAS数据,此类rSNPs可明确自身免疫性疾病、神经精神疾病及肿瘤易感性相关的特定转录通路。 研究设计:为分析65例原发性非肿瘤样本、16例原发性肿瘤样本及GM12878淋巴母细胞样细胞系(lymphoblastoid cell line, LCL)的完整甲基化组,我们分别在纽约基因组中心(New York Genome Center, NYGC)、MNG遗传学公司(MNG Genetics, MNG)及罗斯韦尔帕克癌症研究所基因组共享资源中心(Roswell Park Cancer Institute Genomics Shared Resource, RPCI)开展全基因组亚硫酸氢盐测序(Whole Genome Bisulfite Sequencing, WGBS)。 NYGC采用改良的基于Nextera转座酶的建库方法:简要流程为,首先使用Nextera XT转座体对基因组DNA进行标签化片段化,随后利用5mC进行末端修复;亚硫酸氢盐转化后,通过有限循环PCR连接Illumina接头与定制化亚硫酸氢盐转化接头。每个样本制备2个独立文库并合并,以降低重复率,随后采用Illumina X测序系统进行150 bp双端测序。 MNG的WGBS实验采用Illumina TruSeq DNA甲基化建库试剂盒,严格按照厂商说明书操作,在Illumina NovaSeq仪器上生成150 bp双端测序读段。 RPCI的WGBS实验采用ACCEL-NGS Methyl-Seq DNA文库建库试剂盒(Swift Biosciences),在Illumina NovaSeq仪器上生成150 bp双端测序读段。 测序读段首先进行质量修剪:去除低质量碱基(Phred质量值<20)与接头序列,随后使用Bismark v0.22.3将读段比对至hg19参考基因组。使用GATK HaplotypeCaller进行单核苷酸多态性鉴定,并进行过滤:仅保留双等位基因SNP,且基因型最低质量值≥30、测序深度≥10 reads、次要等位基因频率≥0.05,同时过滤经多重检验校正后的精确检验显示显著偏离哈迪-温伯格平衡(Hardy-Weinberg equilibrium, HWE)的SNP(使用HardyWeinberg R包得到错误发现率(False Discovery Rate, FDR)<0.05)。针对C/T与G/A类型SNP,需先过滤比对至C/T链的读段后再进行评估。 ASM分型需先按等位基因分离携带SNP的读段,随后在使用Bismark甲基化提取器后,通过等位基因标记的读段ID检索等位基因特异性CpG甲基化呼叫结果。若双端读段的SNP分型存在歧义(即一端读段被鉴定为REF等位基因,另一端为ALT等位基因),则将此类读段丢弃。 针对Nextera建库的WGBS数据,由于DNA片段化后使用5mC进行填充反应会影响read 2 5'端前10个碱基对(bp),因此忽略该区域内的C碱基甲基化呼叫结果。此外,不同建库试剂盒存在轻微的甲基化偏好性:Nextera WGBS的偏好性位于双端读段3'端最后2 bp;TruSeq WGBS的偏好性位于双端读段5'端前10 bp及read 2 3'端最后2 bp;ACCEL-NGS WGBS的偏好性位于read 2 5'端前10 bp。因此,上述区域内的甲基化呼叫结果均被忽略。
创建时间:
2019-09-23
二维码
社区交流群
二维码
科研交流群
商业服务