five

Data Sheet 2_Deep learning-based investigation of chloroplast translation regulatory sequences.csv

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_2_Deep_learning-based_investigation_of_chloroplast_translation_regulatory_sequences_csv/30857228
下载链接
链接失效反馈
官方服务:
资源简介:
Understanding the architecture of translational regulatory sequences in diverse chloroplasts is critical for advancing synthetic biology and genetic engineering. In this study, a hybrid deep learning model combining convolutional neural network (CNN), long short-term memory (LSTM), Attention, and Residual architectures was developed to classify and analyse two datasets: 5′ untranslated region sequences from plants and algae, and the sequences with and without Shine-Dalgarno (SD) motifs from both groups. Using 300-nucleotide leader sequences upstream of the start codon as input, the model achieved strong prediction performance for both taxonomic origin and the presence or absence of SD motifs. However, a small subset of plant and algal sequences exhibited algal-like and plant-like patterns, respectively—an encouraging finding for identifying functional heterologous sequences from one group for use in the other group’s genome. The results further revealed significant differences in the plastid leader sequences between the datasets (Plants vs. Algae and SDs vs. without SDs), emphasising distinct features in the first 30 bp upstream of the start codon. This study proposes two potential strategies for introducing heterologous leader sequences in algal plastome engineering: (1) employing plant-derived leader sequences with algal-like patterns tailored to specific algal strains, and (2) constructing hybrid leader sequences harbouring SD motifs by fusing algae-specific ~30 bp upstream regions with their respective plant-derived distal regions. As the first deep learning model to analyse chloroplast translational regulatory sequences, the findings offer valuable guidance for identifying and predicting heterologous leader sequences in plants and algae.
创建时间:
2025-12-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作