Saturation mutagenesis of disease-associated regulatory elements

NIAID Data Ecosystem2026-04-25 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP187107

下载链接

链接失效反馈

官方服务：

资源简介：

The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we performed saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitution and deletion mutations. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or various integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and also comprise a gold-standard dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations. Overall design: We set out to generate variant-specific activity maps for 20 disease-associated regulatory elements, including ten promoters (of TERT, LDLR, HBB, HBG, HNF4A, MSMB, PKLR, F9, FOXE1 and GP1BB) and ten enhancers (of SORT1, ZRS, BCL11A, IRF4, IRF6, MYC (2x), RET, TCF7L2 and ZFAND3), together with one ultraconserved enhancer (UC88). Specifically, we used massively parallel reporter assays (MPRAs) to perform saturation mutagenesis on each of these regulatory elements. Altogether, we empirically measured the functional effects of over 30,000 SNVs or single nucleotide deletions. We focused primarily on regulatory sequences in which specific mutations are known to cause disease, both for their clinical relevance and to provide for positive control variants. Selected elements were limited to 600 base pairs (bp) for technical reasons related to the mapping of variants to barcodes by subassembly. In addition, we selected only sequences where cell line-based reporter assays were previously established. For each of the 21 regulatory elements, we used error-prone PCR to introduce sequence variation at a frequency of less than 1 change per 100 bp. While error-prone PCR is known to be biased in the types of mutations that are generated (e.g. a preference for transitions and T/A transversions), high library complexities (50k-2M constructs per target) allowed us to capture nearly all possible SNVs as well as many single base pair deletions with multiple independent constructs per variant. To distinguish the individual amplification products, we incorporated 15 or 20 bp random sequence tags 3' of the target region using overhanging primers during the error-prone PCR. We cloned promoter libraries and all but two enhancer libraries (RET in pGL3, ZRS in pGL4Z) into the backbones of slightly modified pGL4.11 (Promega, promoter) or pGL4.23 (Promega, enhancer) vectors. For each MPRA experiment, around 5 million cells were plated and incubated for 24 hours before transfection with the libraries. In each experiment, three independent cultures (replicates) were transfected with the same library. In addition, for LDLR and SORT1, independent MPRA libraries were created, as outlined above, and cells were transfected from a different culture and on a different day. In one case (TERT), the same MPRA library was used for experiments in two different cell-types (HEK293T and a glioblastoma cell line). The relative abundance of reporter gene transcripts driven by each promoter or enhancer variant was measured by counting associated 3' UTR tags in amplicons derived from RNA (obtained by targeted RT-PCR), and normalized to its relative abundance in plasmid DNA (obtained by targeted PCR). For all experiments, we excluded tags not matching the assignment and determined the frequency of a tag in RNA or DNA from high-throughput sequencing experiments based on the number of unique molecular identifiers. We only considered tag sequences observed in both RNA and DNA.

创建时间：

2019-09-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集