five

Machine learning identification of adenine methylated PAM sequences inhibitory to SaCas9 activity

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP584185
下载链接
链接失效反馈
官方服务:
资源简介:
Cas9 nucleases can be used with single guide RNAs (sgRNAs) as antimicrobials and genome engineering tools in bacteria, yet applications are hindered by an incomplete understanding of Cas9-target interactions. Here, we generate large-scale SaCas9/sgRNA in vivo bacterial activity datasets and train a machine learning model (crisprHAL) to predict SaCas9 activity. The highest predictive performance was found when downstream sequence flanking the canonical NNGRRN PAM motif at positions [+1] and [+2] was included in model training, correlating with high in vivo activity on sites that included T-rich di-nucleotides in the [+1] and [+2] flanking positions. Strikingly, model predictions and experimentally determined activity in pooled sgRNA experiments in Escherichia coli and Citrobacter rodentium showed significantly reduced SaCas9 activity at sites with 5-NNGGAT[C]-3 PAM sequences. Adenine methylation (*A) at 5-NNGG*AT[C]-3 PAMs in-vitro reduced SaCas9 activity approximately 10 fold, whereas cytosine methylation (5-NNGGAT[*C]-3) had no impact on activity. Our results show that a general purpose machine learning architecture can provide biologically relevant insights into SaCas9-PAM interactions that can better inform activity predictions. Avoidance of adenine methylated PAM sites by SaCas9 may be a mechanism of self versus non-self discrimination or reflect an evolutionary adaptation to counter methylation as an anti-restriction strategy by phage or plasmids.
创建时间:
2025-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作