Supporting data for "Systematic processing of rRNA gene amplicon sequencing data"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100658
下载链接
链接失效反馈官方服务:
资源简介:
With the advent of high throughput sequencing, microbiology is increasingly becoming a data intensive field of science. Because of its low cost, robust databases and established bioinformatic workflows, sequencing of 16S/18S/ITS rRNA gene amplicons, which provides a marker of choice for phylogenetic studies, has become ubiquitous and has grown into the backbone of modern microbial ecology. Many established end-to-end bioinformatic pipelines are available to perform short amplicon sequence data analysis and have proven to be central for advancing the field of microbial ecology. These pipelines have been partly written for a general audience, which is arguably a main reason for their widespread adoption. However, few options exist for more specialized users that are experienced in code scripting, Linux-based systems and high performance computing (HPC) environments. For such an audience, existing pipelines can be limiting to fully leverage modern HPC capabilities and perform tweaking and optimization operations. Moreover, a wealth of stand-alone software packages that perform specific targeted bioinformatic tasks are increasingly accessible through code repositories and scientific publications and finding a way to easily integrate these applications in a pipeline is critical in fast-paced evolution of bioinformatic methodologies. Here we describe AmpliconTagger, a short rRNA marker gene amplicon pipeline coded in a python framework that enables fine tuning and integration of virtually any potential rRNA gene amplicon bioinformatic procedure. It is designed to work within an HPC environment, supporting a complex network of job-dependencies with a smart-restart mechanism in case of job failure or parameter modifications. As proof of concept, we present end results obtained with AmpliconTagger using 16S, 18S, ITS rRNA short gene amplicons and PacBio long read amplicon data types as input. Using a selection of published algorithms for generating Operational Taxonomic Units (OTUs) and Amplicon Sequence Variants (ASVs) and for computing downstream taxonomic summaries and diversity metrics, we demonstrate the performance and versatility of our pipeline for systematic analyses of amplicon sequence data.
提供机构:
GigaScience Database
创建时间:
2019-10-17



