Data from: Quantifying sequence proportions in a DNA-based diet study using Ion Torrent amplicon sequencing: which counts count?

Name: Data from: Quantifying sequence proportions in a DNA-based diet study using Ion Torrent amplicon sequencing: which counts count?
Creator: The University of British Columbia
Published: 2025-04-24 19:37:29
License: 暂无描述

DataCite Commons2025-04-24 更新2025-04-16 收录

下载链接：

https://doi.library.ubc.ca/10.14288/1.0397766

下载链接

链接失效反馈

官方服务：

资源简介：

Abstract A goal of many environmental DNA barcoding studies is to infer quantitative information about relative abundances of different taxa based on sequence read proportions generated by high-throughput sequencing. However, potential biases associated with this approach are only beginning to be examined. We sequenced DNA amplified from faeces (scats) of captive harbour seals (Phoca vitulina) to investigate whether sequence counts could be used to quantify the seals’ diet. Seals were fed fish in fixed proportions, a chordate-specific mitochondrial 16S marker was amplified from scat DNA and amplicons sequenced using an Ion Torrent PGM™. For a given set of bioinformatic parameters, there was generally low variability between scat samples in proportions of prey species sequences recovered. However, proportions varied substantially depending on sequencing direction, level of quality filtering (due to differences in sequence quality between species) and minimum read length considered. Short primer tags used to identify individual samples also influenced species proportions. In addition, there were complex interactions between factors; for example, the effect of quality filtering was influenced by the primer tag and sequencing direction. Resequencing of a subset of samples revealed some, but not all, biases were consistent between runs. Less stringent data filtering (based on quality scores or read length) generally produced more consistent proportional data, but overall proportions of sequences were very different than dietary mass proportions, indicating additional technical or biological biases are present. Our findings highlight that quantitative interpretations of sequence proportions generated via high-throughput sequencing will require careful experimental design and thoughtful data analysis.

摘要 许多环境DNA条形码研究的目标是基于高通量测序产生的序列读段比例，推断不同类群相对丰度的定量信息。然而，这种方法相关的潜在偏差才刚刚开始被研究。我们对圈养斑海豹（Phoca vitulina）粪便样本中的扩增DNA进行测序，以探究序列计数是否可用于量化海豹的饮食结构。研究中，海豹被喂食固定比例的鱼类；从粪便DNA中扩增脊索动物特异性线粒体16S标记，并使用Ion Torrent PGM™对扩增子进行测序。在给定的生物信息学参数集下，粪便样本间回收的猎物物种序列比例通常具有较低变异性。然而，该比例会因测序方向、质量过滤水平（因物种间序列质量差异）和所考虑的最小读长而显著变化。用于识别单个样本的短引物标签也会影响物种比例。此外，各因素间存在复杂的交互作用；例如，质量过滤的效果会受到引物标签和测序方向的影响。对部分样本的重测序结果显示，部分（而非全部）偏差在不同测序运行间具有一致性。较宽松的数据过滤（基于质量分数或读长）通常能产生更一致的比例数据，但序列的总体比例与膳食质量比例差异显著，表明存在额外的技术或生物学偏差。我们的研究结果强调，对高通量测序产生的序列比例进行定量解释，需要精心的实验设计和审慎的数据分析。

提供机构：

The University of British Columbia

创建时间：

2021-05-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集