Combinatorial DNA storage
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP155985
下载链接
链接失效反馈官方服务:
资源简介:
With the world generating digital data at an exponential rate, DNA has emerged as a promising archival medium. It offers a more efficient and long-lasting digital storage solution due to its durability, physical density, and high information capacity. Research in the field includes the development of encoding schemes compatible for existing DNA synthesis and sequencing technologies. Recent studied suggested leveraging the inherent information redundancy of these technologies by using composite DNA alphabets. A major challenge for this approach was the noisy inference process which prevented the use of large composite alphabets. This paper introduces a novel approach for DNA-based data storage offering a 6.5-fold increase in logical density over standard DNA-based storage systems, with near zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter represents a subset of shortmers. The nature of the combinatorial alphabets minimizes mix-up errors and ensures robustness of the system. We formally define different combinatorial encoding schemes and investigate their theoretical properties including information density, reconstruction probabilities and required synthesis and sequencing multiplicities. We suggest an end-to-end design for a combinatorial DNA storage system including encoding schemes, two-dimensional error correction codes and reconstruction algorithms. Using in silico simulations we demonstrate the suggested approach and evaluate different combinatorial alphabets for encoding 10KB messages under different error regimes. The simulations revealed vital insights, including the relative manageability of nucleotide substitution errors over shortmer level insertions and deletions. Sequencing coverage was found to be a key factor affecting the system performance and the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. Our experimental proof-of-concept has validated feasibility, by constructing two combinatorial sequences using Gibson assembly imitating a 4-cycle combinatorial synthesis process. We confirmed successful reconstruction and established the robustness of our approach to different error types. Subsampling experiments supported the important role of sampling rate and its effect on overall performance. Our work establishes the promise of combinatorial shortmer encoding for DNA based data storage while raising theoretical research questions and technical challenges. These include the development of error correction codes for combinatorial DNA, the exploration of optimal sampling rates, and the advancement of DNA synthesis technologies to support combinatorial synthesis. Combining combinatorial principles with error-correcting strategies paves the way for efficient and error-resilient DNA-based storage solutions.
创建时间:
2023-12-14



