five

COins database

收藏
DataCite Commons2024-08-11 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/COins_database/19130465/2
下载链接
链接失效反馈
官方服务:
资源简介:
COins is a database of COI-5P sequences of insects that includes over 532,000 representative sequences of more than 106,000 species specifically formatted for the QIIME2 software platform. It was developed through a combination of automated and manually curated steps, starting from insects COI sequences available in the Barcode of Life Data System selecting sequences that comply to several standards, including a species-level identification. <br> <br> <br> <br> <strong>seq-degapped.qza</strong> --&gt; reference sequences <br> <strong>taxonomy.qza</strong> --&gt; sequences taxonomy <br> <strong>SklearnClassifier_COins_QIIME2_v2022.2.qza</strong> (NEW) --&gt; naïve Bayes taxonomic classifier trained on CO<em>ins</em> (QIIME2 version 2022.2) <br> <strong>Sequences_metadata1.tsv</strong> --&gt; Identification procedure of voucher specimens from which reference sequences were developed. Identification procedure is reported for each sequence included in CO<em>ins</em> (BOLD id reported in <em>BOLDid reference</em> column) and for all identical sequences within haplotypes that were removed at Step 5 of CO<em>ins</em> curation (those for which BOLD id is not available in <em>BOLDid reference </em>column). The haplotype to which each sequence belongs is reported in <em>Haplotype</em> column (haplotypes of each species are labeled with increasing numbers). Identification procedure information derived from sequences associated metadata provided by BOLD system. <br> <strong>Sequences_metadata2.tsv </strong>--&gt; Identical sequences belonging to different species present within CO<em>ins</em>. Each row represents a cluster of identical sequences associated to different species, sequences included in the cluster are labeled with species name and BOLD id. <br>

COins是一款昆虫COI-5P序列数据库,收录了超过10.6万个物种的53.2万余条代表性序列,专为QIIME2软件平台完成格式适配。本数据库通过自动化流程与人工审校相结合的方式构建,数据源自生命条形码数据系统(Barcode of Life Data System,BOLD)中已公开的昆虫COI序列,筛选符合多项标准(包含物种水平鉴定结果)的序列纳入库中。 **seq-degapped.qza** —— 参考序列 **taxonomy.qza** —— 序列分类学注释信息 **SklearnClassifier_COins_QIIME2_v2022.2.qza**(新增)—— 基于COins训练的朴素贝叶斯分类器(适配QIIME2 2022.2版本) **Sequences_metadata1.tsv** —— 参考序列来源凭证标本的鉴定流程说明。本文件记录了COins中所有收录序列的鉴定流程(其BOLD编号记录于`BOLDid reference`列中),以及在COins审校第5步中被移除的单倍型内所有完全相同序列(此类序列的`BOLDid reference`列无可用BOLD编号)。每条序列所属的单倍型信息记录于`Haplotype`列(同一物种的单倍型以递增数字标注)。鉴定流程相关信息源自BOLD系统提供的序列关联元数据。 **Sequences_metadata2.tsv** —— COins中隶属于不同物种的完全相同序列集合。每一行代表一组关联了不同物种的完全相同序列簇,簇内序列以物种名称和BOLD编号进行标注。
提供机构:
figshare
创建时间:
2022-07-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作