five

Identification of low-frequency variants in HIV populations using next-generation sequencing data

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP004411
下载链接
链接失效反馈
官方服务:
资源简介:
A single patient infected by HIV will carry a large population of related, diverse viral strains usually described as a quasi-species. This population diversity complicates both drug resistance profiling and the development of broadly-reactive HIV vaccines. Early detection of minority variants is critical for identifying novel mutations contributing to drug resistance. Next-generation sequencing has the potential to increase our ability to resolve genetic diversity by more deeply sampling HIV populations, but the inherent error rates of the sequencing platform establish a lower bound for detecting low-frequency variants. We have developed an automated computational pipeline that reliably separates sequencing errors from real variations in HIV quasi-species sequenced with Illumina sequencing technolgy at > 50,000-fold coverage, utilizing a control population of five different HIV genomic sequences present in the sample at known concentrations Our workflow automates quality control, alignment of sequencing reads, re-alignment around insertions and deletion and classification of sequencing artifacts using a novel, crowd-sourced classification algorithm that takes into account sequence quality, alignment quality and uniqueness of neighboring nucleotides. This enables us to reliably distinguish minority variants at a lower boundary of 0.1% clonal variation from common sequencing errors while minimizing false positive variant calls. This data sets includes a control library as well as HIV samples obtained from five patients used to evaluate the algorithm. Source code and documentation is available at https://github.com/hbc/projects/tree/master/jl_hiv.
创建时间:
2021-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作