PHYTOPK28-D1D2: A curated database of 28S rRNA gene D1-D2 domains from eukaryotic organisms dedicated to metabarcoding analyses of marine phytoplankton samples
收藏Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/mndb4h87yg
下载链接
链接失效反馈官方服务:
资源简介:
The PHYTOPK28-D1D2 database comprises accession numbers, taxonomic classification and 28S rDNA (D1-D2 domains) sequences that are available in public DNA databases. The sequences, listed in FASTA format, are identified by the accession number and the hierarchical taxonomy information. The PHYTOPK28-D1D2 database was built for the taxonomic annotation of DNA metabarcodes generated from water samples collected in six French Mediterranean lagoons, once a month between May and September/October 2012, and fractionated by size (three size ranges: 0.7-5 µm, 5-20 µm and 20-100 µm). This metabarcode dataset was deposited in the European Nucleotide Archive under the accession number PRJEB18757. The PHYTOPK28-D1D2 database was started with an initial dataset that was retrieved on the April 19, 2013 from the ribosomal DNA database SILVA. Further sequences were added by extensive BLAST searches in the NCBI/GenBank nucleotide database by targeting the main taxonomic divisions among eukaryotic, marine or freshwater, algal and plankton lineages, and excluding environmental sequences. The hereby first version of the database assembled by the end of June 2015, PHYTOPK28-D1D2_v1, reached 8,753 reference sequences, including more than 3,600 from algal/phytoplanktonic lineages (Chlorophyta, Cryptophyta, Dinophyceae, Haptophyceae, Stramenopiles, Rhodophyta, Euglenozoa, Rhizaria, Glaucocystophyceae) and ~700 from microzooplankton (including ciliates, rotifers, copepods) when it was used for computing the annotation of the metabarcode library. It is not claimed that this PHYTOPK28-D1D2 database is exhaustive with respect to its purpose. It is not warranted that the database does not contain overlooked identification errors from undetected errors originating from the deposition in public databases or from missed literature reporting taxonomic changes. The database can also lack recently released data at the time of use in June 2015. It is intended to further enrich the database by adding new–mostly recently released–sequence accessions and to make a new database version available from time to time. Anyone interested in receiving a recently updated database can contact the first author (DG). Any information reporting errors, omissions or recently released sequences would also be welcome to help in this updating effort. It would be interesting to make this database become richer by adding more information on reference sequences, for example by linking the accession numbers to GenBank database information, by adding and linking to the article reference related to the sequence submission (an information that is not always updated in the public DNA databases) and eventually, the subsequent literature references leading to changes in the taxonomic name or in the classification of organisms.
PHYTOPK28-D1D2数据库收录了公共DNA数据库中公开的登录号、分类学归属信息以及28S核糖体DNA(28S rDNA,D1-D2结构域)序列。该数据集以FASTA格式存储,序列通过登录号与层级分类学信息进行标识。本数据库专为法国地中海沿岸6处潟湖采集的水样所产生的DNA代谢条形码(DNA metabarcode)的分类学注释而构建:这些水样于2012年5月至9/10月期间每月采集一次,并按粒径分级,分为0.7-5 μm、5-20 μm及20-100 μm三个粒径区间。该代谢条形码数据集已以登录号PRJEB18757提交至欧洲核苷酸档案库(European Nucleotide Archive)。本数据库的初始数据集于2013年4月19日从核糖体DNA数据库SILVA中获取。后续通过在NCBI/GenBank核苷酸数据库中开展大规模BLAST序列比对搜索,针对真核生物、海洋/淡水藻类及浮游生物类群的主要分类分支,并排除环境序列,补充了更多序列。2015年6月底完成组装的数据库首个版本PHYTOPK28-D1D2_v1共收录8753条参考序列,其中包含3600余条来自藻类/浮游植物类群(绿藻门(Chlorophyta)、隐藻门(Cryptophyta)、甲藻纲(Dinophyceae)、定鞭藻纲(Haptophyceae)、不等鞭毛类(Stramenopiles)、红藻门(Rhodophyta)、眼虫门(Euglenozoa)、粒网虫门(Rhizaria)、灰胞藻纲(Glaucocystophyceae))的序列,以及约700条来自微型浮游动物(microzooplankton,包括纤毛虫、轮虫、桡足类)的序列,该版本曾用于计算代谢条形码文库的注释信息。本数据库并非旨在覆盖其应用场景下的所有相关序列。本数据库无法保证不存在因公共数据库提交阶段未被发现的错误,或未被收录的分类学变更文献所导致的漏检鉴定错误。此外,本数据库在2015年6月发布时,可能未收录同期新近公开的数据。本项目计划通过新增序列登录号(以新近公开的序列为主)持续丰富数据库内容,并定期发布新版本。有意获取最新更新版数据库的人员可联系第一作者(DG)。若有用户能提供关于数据库错误、遗漏或新近公开序列的相关信息,将有助于本数据库的更新工作,我们对此表示欢迎。未来还可通过为参考序列补充更多关联信息进一步丰富本数据库,例如将登录号与GenBank数据库信息进行关联,添加并关联序列提交相关的文献引用(此类信息在公共DNA数据库中并非始终保持更新),以及最终补充导致生物分类学名称或分类体系发生变更的后续文献引用。
创建时间:
2024-01-23



