Deep learning-based semantic matching of cis-regulatory DNA sequences facilitates the prediction of gene function
收藏DataCite Commons2026-02-19 更新2026-04-25 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Deep_learning-based_semantic_matching_of_cis-regulatory_DNA_sequences_facilitates_the_prediction_of_gene_function/28931018
下载链接
链接失效反馈官方服务:
资源简介:
The rich information encoded in cis-regulatory DNA sequences has not been fully exploited for gene function prediction in reverse genetics. Here we show that orthologous cis-regulatory sequences that diverged approximately 160 million years ago share little sequence similarity, yet remarkably retain semantic similarity that can be effectively captured by a deep learning model, PhytoBabel. Although trained solely on orthologous cis-regulatory sequence pairs from 15 angiosperms, PhytoBabel implicitly learned spatio-temporal gene expression patterns, conserved non-coding sequences, semantically similar fragments, and phylogenetic relationships among species. Furthermore, PhytoBabel enables the discovery of evolutionarily unrelated but semantically similar cis-regulatory sequences, facilitating the identification of novel genes with functions of interest. As a proof-of-concept, we identified in maize new somatic embryogenesis-related morphogenic regulators exhibiting semantic similarity to known Arabidopsis morphogenic regulators. By bridging the gap in the cis-regulatory sequence → semantics → gene function information chain, PhytoBabel provides a valuable tool for gene function prediction in reverse genetics.
顺式调控DNA序列(cis-regulatory DNA sequences)中编码的丰富信息,在反向遗传学(reverse genetics)的基因功能预测研究中尚未得到充分利用。本研究证实,约1.6亿年前发生分化的直系同源顺式调控序列(orthologous cis-regulatory sequences),尽管序列相似性极低,却惊人地保留了可被深度学习模型PhytoBabel有效捕捉的语义相似性。尽管PhytoBabel仅基于15种被子植物的直系同源顺式调控序列对进行训练,却隐式学习到了时空基因表达模式、保守非编码序列、语义相似片段以及物种间的系统发育关系。此外,PhytoBabel还可用于发现进化上无关联但语义相似的顺式调控序列,助力鉴定具有目标功能的新型基因。作为概念验证,我们在玉米中鉴定出了与已知拟南芥形态发生调控因子语义相似的全新体细胞胚胎发生相关形态发生调控因子。通过填补顺式调控序列→语义→基因功能这一信息链条的空白,PhytoBabel为反向遗传学领域的基因功能预测提供了极具价值的研究工具。
提供机构:
figshare
创建时间:
2025-05-05



