Supporting data for "NanoMnT: An STR Analysis Tool for Oxford Nanopore Sequencing Data Driven by a Comprehensive Analysis of Error Profile in STR regions"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
https://gigadb.org/dataset/102658
下载链接
链接失效反馈官方服务:
资源简介:
Oxford Nanopore sequencing (ONT) is a third-generation sequencing technology that enables cost-effective long-read sequencing with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR) related research. <br>To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy (SUP) model significantly improved STR sequencing accuracy. <br>Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1bp- and 2bp-repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78%, and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability (MSI) status in cancer sequencing data.
提供机构:
GigaScience Database
创建时间:
2025-02-03



