Data from: Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels

DataONE2012-08-13 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (maxcount=7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms.

在大规模并行测序（massively parallel sequencing）前，将带有独特合成寡核苷酸（oligonucleotide）序列的连接接头连接至单个DNA样本，是从众多个体样本中获取序列数据的一种流行且高效的手段。序列标签（sequence tags）应具备足够的数量与差异性，以确保测序、复制及寡核苷酸合成过程中产生的错误不会导致标签无法恢复或相互混淆。然而，多数设计方案仅能抵御测序过程中的替换错误，且现有的标签集序列数量过少。我们开发了一款开源软件包，用于验证序列标签是否符合两种距离度量标准，并设计出可耐受插入缺失（indel）与替换错误的稳健序列标签。本研究依托该软件包评估了多款商用与非商用序列标签集，设计了多组大型编辑距离序列标签（edit metric sequence tags）集（最大标签数量为7198），这些标签具备不同的长度与错误校正能力，并将其中部分编辑距离序列标签整合至聚合酶链式反应（polymerase chain reaction, PCR）引物与测序接头中。我们通过多平台测序，并与商用同类产品进行比对，验证了部分搭载编辑距离标签的PCR引物与测序接头的性能。本研究发现，多款常用序列标签集或用于生成序列标签的设计方法，均未达到其底层距离度量的最低要求；同时，搭载本软件包设计的编辑距离序列标签的PCR引物与测序接头，其性能可与商用同类产品媲美。我们建议研究人员在使用序列标签前对其进行评估，或对自身正在使用的标签开展评估工作。本研究所设计的序列标签集相较于现有方案更具优势，因其标签数量充足、集内验证合规，且可耐受当前所有主流大规模并行测序流程中出现的各类替换、插入与缺失错误。

创建时间：

2012-08-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集