Identification of low-frequency variants in HIV populations using next-generation sequencing data

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/ERP004411

下载链接

链接失效反馈

官方服务：

资源简介：

A single patient infected by HIV will carry a large population of related, diverse viral strains usually described as a quasi-species. This population diversity complicates both drug resistance profiling and the development of broadly-reactive HIV vaccines. Early detection of minority variants is critical for identifying novel mutations contributing to drug resistance. Next-generation sequencing has the potential to increase our ability to resolve genetic diversity by more deeply sampling HIV populations, but the inherent error rates of the sequencing platform establish a lower bound for detecting low-frequency variants. We have developed an automated computational pipeline that reliably separates sequencing errors from real variations in HIV quasi-species sequenced with Illumina sequencing technolgy at > 50,000-fold coverage, utilizing a control population of five different HIV genomic sequences present in the sample at known concentrations Our workflow automates quality control, alignment of sequencing reads, re-alignment around insertions and deletion and classification of sequencing artifacts using a novel, crowd-sourced classification algorithm that takes into account sequence quality, alignment quality and uniqueness of neighboring nucleotides. This enables us to reliably distinguish minority variants at a lower boundary of 0.1% clonal variation from common sequencing errors while minimizing false positive variant calls. This data sets includes a control library as well as HIV samples obtained from five patients used to evaluate the algorithm. Source code and documentation is available at https://github.com/hbc/projects/tree/master/jl_hiv.

创建时间：

2021-02-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集