Whole genome sequencing of M. tuberculosis clinical isolates from Mozambique

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26689

下载链接

链接失效反馈

官方服务：

资源简介：

Whole genome sequencing provides comprehensive data on the evolution and dynamics of bacterial populations. Moreover, due to its high-resolution, it is gaining an increasing popularity as a cost-effective alternative for the clinical diagnosis of pathogens, from drug-resistance prediction to real-time epidemiology. The computational analysis of the massive data generated is crucial to achieve accurate results. Bioinformatic pipelines need to be comprehensively evaluated to guarantee that analysis outcomes represent real biological traits. The effect of contaminant DNA in whole genome sequencing data has been usually overlooked, as many assumptions about the purity of samples, or the robustness of bioinformatic tools, are commonly made. In this work, we analyzed an extensive dataset of both, in-silico generated data and clinical samples of Mycobacterium tuberculosis. By comparing a standard analysis pipeline with two contamination-aware pipelines, we evaluated the impact of contaminant DNA in whole genome sequencing outcomes. We show that contamination is a common phenomenon across different studies and types of sample, and demonstrated that it can be a major source of error, often leading to miscalculations of allele frequencies, drug resistance predictions and transmission estimates. We encourage to use contamination-aware analysis pipelines in order to achieve accurate/reliable results from whole genome sequencing data.

创建时间：

2019-07-01