Exploration of whole genome amplification generated chimeric sequences using long-read sequencing

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP423093

下载链接

链接失效反馈

官方服务：

资源简介：

Multiple displacement amplification (MDA) has become one of the most commonly used method of whole genome amplification (WGA) due to the high processivity, strand displacement capacity and high fidelity of the phi29 DNA polymerase, MDA generate vast amount of DNA with higher molecules weight (up to 100kb) and greater genome coverage. Along with the development of the sequencing platform, it is possible to sequence the MDA-amplified DNA molecules with over 20kb by long-read sequencing. However, one of the challenges is the formation of chimeras, which exist in all MDA products, and seriously interfere with the downstream analysis of the long-read sequencing data of MDA-amplified DNA. In here, we sequenced phi29 DNA polymerase mediated MDA samples on PacBio platform, and constructed 3rd-ChimeraMiner, a chimera detection pipeline for analyzing the long-read sequencing of MDA products, recognizing chimeras, and integrating chimeras into the downstream analysis. Five Continuous Long Reads (CLR) datasets and one high-fidelity (HiFi) dataset with different magnification folds were analyzed, the proportions of chimeras are much higher than that of next-generation sequencing reads and along with the increase of magnification folds, ranging from 42% to over 78%. After comparing, 99.92% of recognized chimeras have been demonstrated not to exist in original genomes. After detecting chimeras by 3rd-ChimeraMiner, the full-length mapping ratio increased, means more PacBio data could be used in downstream analysis, and mean 97.77% inversions were removed after transferred chimeras into normal reads. 3rd-ChimeraMiner reveal efficiency and accuracy in discovering chimeras from long-read sequencing data of MDA, and is promising to be widely used in single-cell sequencing.

创建时间：

2023-02-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集