Data from: Rapid and accurate taxonomic classification of insect (Class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier
收藏DataCite Commons2025-05-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.bc8pc
下载链接
链接失效反馈官方服务:
资源简介:
Current methods to identify unknown insect (class Insecta) cytochrome c
oxidase (COI barcode) sequences often rely on difficult to define
thresholds of distances, sequence similarity cutoffs, or monophyly. Most
methods do not provide a measure of confidence for the taxonomic
assignments they provide. The aim of this study is to use a naïve Bayesian
classifier (Wang et al., 2007) to automate unsupervised taxonomic
assignments for large batches of insect COI sequences such as data
obtained from environmental barcoding using next generation sequencing
platforms. This method provides rank-flexible taxonomic assignments with
an associated bootstrap support value and it is faster than the
BLAST-based methods commonly used in environmental sequence surveys. We
have developed and rigorously tested the performance of three different
training sets using leave-one-out cross-validation, two field datasets,
and targeted testing of Lepidoptera, Diptera, and Mantodea sequences
obtained from the Barcode of Life Data system. We found that type I error
rates, incorrect taxonomic assignments with a high bootstrap support, were
already relatively low but could be lowered further by ensuring that all
query taxa are actually present in the reference database. Choosing
bootstrap support cutoffs according to query length and summarizing
taxonomic assignments to more inclusive ranks can also help to reduce
error while retaining the maximum number of assignments. Additionally, we
highlight gaps in the taxonomic and geographic representation of insects
in public sequence databases that will require further work by taxonomists
to improve the quality of assignments generated using any method.
提供机构:
Dryad
创建时间:
2014-02-18



