P. falciparum-specific Genomic Context Protein-Binding Microarray. P. falciparum-specific Genomic Context Protein-Binding Microarray
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA947215
下载链接
链接失效反馈官方服务:
资源简介:
Development of the human malaria parasite, Plasmodium falciparum is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites to regulate target genes is still largely unknown. To address TF target specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC and further characterized PfAP2-G and PfAP2-EXP which bind unique DNA motifs (GTAC and TGCATGCA). We interrogated the impact of DNA sequence context and the chromatin landscape on P. falciparum TF binding using high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We determined that DNA sequence context does not greatly impact binding site selection for CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization contribute to differential binding. In contrast, GTGCAC-binding TFs prefer different sequence contexts, DNA shape profiles, and chromatin dynamics. Finally, we find that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions in vivo due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms. Overall design: 4x180k high-density Genomic Context Protein-Binding Microarray (gcPBM) representing over 24,000 unique genomic regions from the P. falciparum genome. This array design was used to test eight specific DNA-binding domains (DBDs) against motif-containing sites from the P. falciparum genome. Using position weight matrix (PWM) data from published work (De Silva et al. 2008 & Campbell et al. 2010), all instances of each motif of interest (i.e., CACACA, GTGCAC, GTAC, and TGCATGCA) were identified in the P. falciparum genome (Plasmodium falciparum 3D7 strain genome release v38), using a motif E-score cutoff of >0.45. Only intergenic regions (excluding telomeric regions) were used for this gcPBM design. The numbers of probes found containing each motif recognized by the eight DBDs are as follows: 2848 probes with putative sites for PF3D7_0420300_D1 (500 negative controls), 4251 probes with putative sites for AP2-LT_D1 (500 negative controls), 3864 probes with putative sites for PF3D7_1305200_D1 (500 negative controls), 4321 probes with putative sites for AP2-HC_D1 (500 negative controls), 1459 probes with putative sites for SIP2_D1 (500 negative controls), 3742 probes with putative sites for AP2-I_D3 (500 negative controls), 8998 probes with putative sites for AP2-G_D1 (1000 negative controls), and 1059 probes with putative sites for AP2-EXP_D1 (1000 negative controls). Any sequence containing another instance of the centered motif in the left or right flanks was mutated to prevent multiple binding sites per 36-bp window. The number of 36-bp genomic regions designed for each TF. HDP1 motif-specific genomic sequences were not initially included in the gcPBM design due to its identification and characterization after the initial design of the gcPBM experiment, and was included after the fact. Due to the similarities between the CACACA and GTGCAC PWMs, there were genomic DNA sequences that led to redundant probe designs, which were discarded, leaving only one instance of the sequence. After discarding redundant probe designs with motif types, the total number of probes per motif group is as follows: 9388 probes with putative CACACA sites (1834 CACACA negative controls), 1394 probes with putative GTGCAC sites (736 GTGCAC negative controls), 8998 probes with putative GTAC sites (620 GTAC negative controls), and 1059 probes with putative TGCATGCA sites (612 TGCATGCA negative controls). Overall, the P. falciparum gcPBM design reached a total of 24,641 unique genomic regions. Each DNA probe was replicated in random areas of the microarray surface (CACACA/GTGCAC with 8 replicates per sequence and GTAC/TGCATGCA with 6 replicates per sequence), which brought the total number of DNA probes to 174,550 spots for a 4x180k microarray (Agilent Technologies). Additional spots on the array were set aside for control grid alignment, microarray scanning, and downstream analysis.
人类疟疾寄生虫恶性疟原虫(Plasmodium falciparum)的发育过程,受有限的序列特异性转录因子(Transcription Factors, TFs)调控。然而,此类转录因子如何识别全基因组结合位点以调控靶基因的具体机制,目前仍未完全明晰。为解析转录因子的靶标特异性,本研究针对两类分别结合CACACA与GTGCAC序列的转录因子子集开展结合实验,并进一步表征了结合独特DNA基序(DNA motif)的PfAP2-G与PfAP2-EXP蛋白。本研究借助高通量体外与体内结合实验、DNA结构预测、表观遗传翻译后修饰分析以及染色质可及性检测,探究了DNA序列背景与染色质景观对恶性疟原虫转录因子结合的影响。
研究结果表明,DNA序列背景对结合CACACA序列的转录因子的结合位点选择影响有限;而染色质可及性、表观遗传模式、辅因子招募以及二聚化过程,则是导致结合差异的关键因素。与之相反,结合GTGCAC序列的转录因子则更偏好特定的序列背景、DNA结构特征与染色质动态变化。最后,本研究发现,优先结合不同DNA基序的转录因子,可能因对其他序列基序具有低亲和力结合能力,而在体内结合到重叠的基因组区域。本研究结果证实,转录因子的结合位点选择依赖于DNA序列与染色质特征的共同作用,这为阐明恶性疟原虫基因调控机制的复杂性提供了重要依据。
## 实验整体设计
采用4×180k高密度基因组上下文蛋白结合微阵列(Genomic Context Protein-Binding Microarray, gcPBM),该芯片覆盖恶性疟原虫基因组中超过24000个独特基因组区域。本芯片设计用于针对恶性疟原虫基因组中的基序结合位点,检测8个特定的DNA结合结构域(DNA-binding domains, DBDs)的结合活性。借助已发表研究中的位置权重矩阵(position weight matrix, PWM)数据(De Silva等人,2008;Campbell等人,2010),本研究以E值得分>0.45为阈值,在恶性疟原虫3D7菌株基因组版本v38中,鉴定出所有目标基序(即CACACA、GTGCAC、GTAC与TGCATGCA)的存在位点。本gcPBM芯片设计仅使用基因间区域(端粒区域除外)。
针对8个DNA结合结构域,各目标基序对应的探针数量如下:PF3D7_0420300_D1的潜在结合位点探针共2848条(含500条阴性对照);AP2-LT_D1共4251条(含500条阴性对照);PF3D7_1305200_D1共3864条(含500条阴性对照);AP2-HC_D1共4321条(含500条阴性对照);SIP2_D1共1459条(含500条阴性对照);AP2-I_D3共3742条(含500条阴性对照);AP2-G_D1共8998条(含1000条阴性对照);AP2-EXP_D1共1059条(含1000条阴性对照)。若某序列在侧翼区域(左侧或右侧)存在另一个中心基序,则对其进行突变,以避免在单个36bp窗口内出现多个结合位点。
针对每个转录因子设计的36bp基因组区域:由于HDP1基序是在gcPBM实验初始设计完成后才被鉴定与表征的,因此其特异性基因组序列最初未被纳入芯片设计,后续才补充加入。由于CACACA与GTGCAC的位置权重矩阵存在相似性,部分基因组DNA序列会导致探针设计冗余,此类冗余探针均被剔除,仅保留每条序列的一个实例。剔除基于基序类型的冗余探针后,各基序组对应的探针总数如下:含潜在CACACA结合位点的探针共9388条(含1834条CACACA阴性对照);含潜在GTGCAC结合位点的探针共1394条(含736条GTGCAC阴性对照);含潜在GTAC结合位点的探针共8998条(含620条GTAC阴性对照);含潜在TGCATGCA结合位点的探针共1059条(含612条TGCATGCA阴性对照)。
综上,本恶性疟原虫gcPBM芯片共设计了24641个独特基因组区域。每条DNA探针均在芯片表面的随机区域进行重复:CACACA/GTGCAC组的每条序列设置8次重复,GTAC/TGCATGCA组设置6次重复。对于4×180k芯片(安捷伦科技,Agilent Technologies)而言,这使得总探针点数达到174550个。芯片上额外预留了部分点位,用于网格对齐校准、芯片扫描及后续分析。
创建时间:
2023-03-21



