five

Manual validation finds only ultralong long-read sequencing enables faithful, population-level structural variant calling in Drosophila melanogaster euchromatin

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP577251
下载链接
链接失效反馈
官方服务:
资源简介:
The increasing accessibility of long-read sequencing and the rapid development of automated variant callers are enabling the generation of population-level structural variation data. However, the effect of the length of long-reads on automated variant callers is not well understood, especially for non-human species. Here we show that only ultralong long-reads, with read-N50s greater than 70kb are capable of accurately calling structural variants of any size in Drosophila melanogaster euchromatin. We used Oxford Nanopore Technologies to long-read sequence eight, inbred D. melanogaster strains to extremely high coverage, and we then downsampled the reads to create read pools of different length distributions. We assembled genomes from these different read-length pools and used both read-based and assembly-based structural variant callers to call variants in each strain before merging the calls into population-level datasets. We manually validated over 2,300 structural variant calls to assess the accuracy of the variant calls across the different read-length distributions and to determine the cause of false positive errors. We find that more than half of all structural-variant-calling errors stem from misaligned reads containing mobile elements or located in repetitive and complex regions. Overall, our results show that long reads need to be significantly longer than the repetitive and mobile elements found in the genome in order to accurately call structural variants at the population level.
创建时间:
2026-03-11
二维码
社区交流群
二维码
科研交流群
商业服务