An optimization of 16S community amplicon analysis using mock communities

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP091609

下载链接

链接失效反馈

官方服务：

资源简介：

Diversity of complex microbial communities can be rapidly assessed by community amplicon sequencing of marker genes (e.g., 16S), often yielding many thousands of DNA sequences per sample. However, analysis of community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assigners performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our âdefaultâ analysis in QIIME overestimated mock community diversity by at least a factor of ten, compared to the optimized analysis which correctly characterized the taxonomic composition of the mock communities while still overestimating OTU diversity by about a factor of two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low quality base call resulting in sequence truncation during quality filtering. Low quality base calls were observed at âGâ positions most of the time, and were also associated with a preceding âTTTâ trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are both important steps for accurate estimation of microbial community diversity.

创建时间：

2017-09-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集