DataSheet1_Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data.XLSX

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://figshare.com/articles/dataset/DataSheet1_Incorporation_of_Data_From_Multiple_Hypervariable_Regions_when_Analyzing_Bacterial_16S_rRNA_Gene_Sequencing_Data_XLSX/19485773

下载链接

链接失效反馈

官方服务：

资源简介：

Short read 16 S rRNA amplicon sequencing is a common technique used in microbiome research. However, inaccuracies in estimated bacterial community composition can occur due to amplification bias of the targeted hypervariable region. A potential solution is to sequence and assess multiple hypervariable regions in tandem, yet there is currently no consensus as to the appropriate method for analyzing this data. Additionally, there are many sequence analysis resources for data produced from the Illumina platform, but fewer open-source options available for data from the Ion Torrent platform. Herein, we present an analysis pipeline using open-source analysis platforms that integrates data from multiple hypervariable regions and is compatible with data produced from the Ion Torrent platform. We used the ThermoFisher Ion 16 S Metagenomics Kit and a mock community of twenty bacterial strains to assess taxonomic classification of six amplicons from separate hypervariable regions (V2, V3, V4, V6-7, V8, V9) using our analysis pipeline. We report that different amplicons have different specificities for taxonomic classification, which also has implications for global level analyses such as alpha and beta diversity. Finally, we utilize a generalized linear modeling approach to statistically integrate the results from multiple hypervariable regions and apply this methodology to data from a representative clinical cohort. We conclude that examining sequencing results across multiple hypervariable regions provides more taxonomic information than sequencing across a single region. The data across multiple hypervariable regions can be combined using generalized linear models to enhance the statistical evaluation of overall differences in community structure and relatedness among sample groups.

短读长16S核糖体RNA（16S rRNA）扩增子测序是微生物组研究领域的常用技术手段。然而，由于靶向高变区的扩增偏倚，可能会导致细菌群落组成的估算结果出现偏差。一种可行的解决方案是串联测序并评估多个高变区，但目前针对此类数据的标准化分析方法尚未形成统一共识。此外，针对Illumina平台产出的测序数据，已有诸多成熟的序列分析工具与资源，但适配Ion Torrent平台数据的开源分析方案则相对匮乏。本研究提出一套基于开源分析平台的整合式分析流程，该流程可整合多个高变区的测序数据，且兼容Ion Torrent平台产出的测序数据。我们采用赛默飞世尔（ThermoFisher）Ion 16S宏基因组学试剂盒（ThermoFisher Ion 16S Metagenomics Kit），以及包含20种细菌菌株的模拟群落，通过本研究提出的分析流程，对来自6个独立高变区（V2、V3、V4、V6-7、V8、V9）的扩增子进行分类学鉴定评估。研究结果显示，不同扩增子在分类学鉴定中具有不同的特异性，这一发现对α多样性、β多样性等全局水平的群落分析亦具有重要参考价值。最后，我们采用广义线性模型（generalized linear model, GLM）的统计分析方法，整合多个高变区的分析结果，并将该方法应用于一个典型临床队列的测序数据中。本研究得出结论：相较于仅对单个高变区进行测序，跨多个高变区分析测序结果可获取更为丰富的分类学信息。通过广义线性模型可整合多高变区的测序数据，从而提升对群落结构整体差异及样本组间相关性的统计评估效能。

创建时间：

2022-03-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集