five

Selective Constraints in Experimentally Defined Primate Regulatory Regions

收藏
Figshare2016-01-18 更新2026-05-11 收录
下载链接:
https://figshare.com/articles/dataset/Selective_Constraints_in_Experimentally_Defined_Primate_Regulatory_Regions/149774
下载链接
链接失效反馈
官方服务:
资源简介:
Changes in gene regulation may be important in evolution. However, the evolutionary properties of regulatory mutations are currently poorly understood. This is partly the result of an incomplete annotation of functional regulatory DNA in many species. For example, transcription factor binding sites (TFBSs), a major component of eukaryotic regulatory architecture, are typically short, degenerate, and therefore difficult to differentiate from randomly occurring, nonfunctional sequences. Furthermore, although sites such as TFBSs can be computationally predicted using evolutionary conservation as a criterion, estimates of the true level of selective constraint (defined as the fraction of strongly deleterious mutations occurring at a locus) in regulatory regions will, by definition, be upwardly biased in datasets that are a priori evolutionarily conserved. Here we investigate the fitness effects of regulatory mutations using two complementary datasets of human TFBSs that are likely to be relatively free of ascertainment bias with respect to evolutionary conservation but, importantly, are supported by experimental data. The first is a collection of almost >2,100 human TFBSs drawn from the literature in the TRANSFAC database, and the second is derived from several recent high-throughput chromatin immunoprecipitation coupled with genomic microarray (ChIP-chip) analyses. We also define a set of putative cis-regulatory modules (pCRMs) by spatially clustering multiple TFBSs that regulate the same gene. We find that a relatively high proportion (��37%) of mutations at TFBSs are strongly deleterious, similar to that at a 2-fold degenerate protein-coding site. However, constraint is significantly reduced in human and chimpanzee pCRMS and ChIP-chip sequences, relative to macaques. We estimate that the fraction of regulatory mutations that have been driven to fixation by positive selection in humans is not significantly different from zero. We also find that the level of selective constraint in our TFBSs, pCRMs, and ChIP-chip sequences is negatively correlated with the expression breadth of the regulated gene, whereas the opposite relationship holds at that gene's nonsynonymous and synonymous sites. Finally, we find that the rate of protein evolution in a transcription factor appears to be positively correlated with the breadth of expression of the gene it regulates. Our study suggests that strongly deleterious regulatory mutations are considerably more likely (1.6-fold) to occur in tissue-specific than in housekeeping genes, implying that there is a fitness cost to increasing ��complexity�� of gene expression.

基因调控的改变可能在进化过程中具有关键意义。然而,目前学界对调控突变的进化特性仍缺乏深入认知。这在一定程度上源于诸多物种的功能性调控DNA注释仍不全面。例如,作为真核生物调控结构核心组成部分的转录因子结合位点(transcription factor binding sites, TFBSs),通常序列较短且具有简并性,因此难以与随机产生的非功能性序列加以区分。此外,尽管可借助进化保守性作为标准通过计算方法预测TFBSs等调控位点,但根据定义,若数据集是预先基于进化保守性筛选得到的,则对调控区域真实选择约束水平——即某位点上发生强烈有害突变的比例——的估计值会被向上偏倚。 在此研究中,我们借助两套互补的人类TFBS数据集探究调控突变的适合度效应,这两套数据集在进化保守性层面基本不存在确认偏倚(ascertainment bias),且至关重要的是,均有实验数据支撑。第一套数据集是从TRANSFAC数据库文献中提取的逾2100个人类TFBS集合;第二套则源自多项近期的高通量染色质免疫共沉淀结合基因组微阵列(chromatin immunoprecipitation coupled with genomic microarray, ChIP-chip)分析。我们还通过将调控同一基因的多个TFBS进行空间聚类,定义了一套推定顺式调控模块(putative cis-regulatory modules, pCRMs)。 我们发现,TFBS上约37%的突变为强烈有害突变,这与2倍简并蛋白质编码位点的情况相近。但相较于猕猴,人类与黑猩猩的pCRMs及ChIP-chip序列中的选择约束水平显著降低。我们估计,在人类中经正选择驱动并固定的调控突变比例与零无显著差异。我们还发现,在本研究的TFBS、pCRMs及ChIP-chip序列中,选择约束水平与受调控基因的表达广度呈负相关;而在该基因的非同义位点与同义位点上则呈现相反的相关性。最后,我们发现转录因子自身的蛋白质进化速率与其所调控基因的表达广度呈正相关。本研究表明,强烈有害的调控突变在组织特异性基因中发生的概率是持家基因的1.6倍,这意味着提升基因表达的“复杂性”会带来适合度代价。
创建时间:
2016-01-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作