Direct high-throughput deconvolution of unnatural bases via nanopore sequencing and bootstrapped learning

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/ERP166391

下载链接

链接失效反馈

官方服务：

资源简介：

The discovery of synthetic xeno-nucleic acids (XNAs) that can basepair as unnatural bases (UBs) to expand the genetic alphabet has spawned interest in many applications, from synthetic biology to DNA storage. However, the inability to read XNAs in a direct, high-throughput manner has been a significant limitation for xenobiology. Here we demonstrate that XNA-containing templates can be directly and robustly sequenced (>2.3 million reads/flowcell, similar to DNA controls) on a MinION sequencer from Oxford Nanopore Technologies to obtain signal data that is significantly distinct from DNA controls (>86% of reads, median fold-change >6x). To enable training of machine learning models that deconvolute these signals and basecall XNAs along with natural bases, we developed a framework to synthesize a complex pool of 1,024 UB-containing oligonucleotides with diverse 6-mer sequence contexts and high XNA purity (>90% UB-insertion on average). Bootstrapped models to enable data preparation, and data augmentation with spliced XNA reads to provide high context diversity, enabled learning of a generalizable model to call natural as well as unnatural bases with high accuracy (>80%) and specificity (99%). These results highlight the versatility of nanopore sequencing as a platform for interrogating nucleic acids for xenobiology applications, and the potential to transform the study of genetic material beyond those that use canonical bases.

创建时间：

2025-02-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集