Identification of low-frequency variants in HIV populations using next-generation sequencing data
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP004411
下载链接
链接失效反馈官方服务:
资源简介:
A single patient infected by HIV will carry a large population of related, diverse viral strains usually described as a quasi-species. This population diversity complicates both drug resistance profiling and the development of broadly-reactive HIV vaccines. Early detection of minority variants is critical for identifying novel mutations contributing to drug resistance. Next-generation sequencing has the potential to increase our ability to resolve genetic diversity by more deeply sampling HIV populations, but the inherent error rates of the sequencing platform establish a lower bound for detecting low-frequency variants. We have developed an automated computational pipeline that reliably separates sequencing errors from real variations in HIV quasi-species sequenced with Illumina sequencing technolgy at > 50,000-fold coverage, utilizing a control population of five different HIV genomic sequences present in the sample at known concentrations Our workflow automates quality control, alignment of sequencing reads, re-alignment around insertions and deletion and classification of sequencing artifacts using a novel, crowd-sourced classification algorithm that takes into account sequence quality, alignment quality and uniqueness of neighboring nucleotides. This enables us to reliably distinguish minority variants at a lower boundary of 0.1% clonal variation from common sequencing errors while minimizing false positive variant calls. This data sets includes a control library as well as HIV samples obtained from five patients used to evaluate the algorithm. Source code and documentation is available at https://github.com/hbc/projects/tree/master/jl_hiv.
创建时间:
2021-02-04



