five

SBDI Sativa curated 16S GTDB database

收藏
DataCite Commons2025-01-15 更新2025-04-16 收录
下载链接:
https://figshare.scilifelab.se/articles/dataset/SBDI_Sativa_curated_16S_GTDB_database/14869077/2
下载链接
链接失效反馈
官方服务:
资源简介:
The data in this repository is the result of vetting 16S sequences from the GTDB database release R06-RS202 (https://gtdb.ecogenomic.org/; Parks et al. 2018) with the Sativa program (Kozlov et al. 2016).<br><br>Files for the DADA2 (Callahan et al. 2016) methods `assignTaxonomy` and `addSpecies` are available: gtdb-sbdi-sativa.r06rs202.assignTaxonomy.fna.gz and gtdb-sbdi-sativa.r06rs202.addSpecies.fna.gz.<br>There is also a fasta file with the original GTDB sequence names: gtdb-sbdi-sativa.r06rs202.fna.gz<br>All three files are gzipped fasta files with 16S sequences, the assignTaxonomy associated with taxonomy hierarchies from domain to genus whereas the addSpecies file have sequence identities and species names.<br><br>Taxonomical annotation of 16S amplicons using this data is available as an optional argument to the nf-core/ampliseq Nextflow workflow from version 2.1: --dada_ref_taxonomy sbdi-gtdb (https://nf-co.re/ampliseq; Straub et al. 2020).<br><br>The data will be updated circa yearly, when the GTDB database is updated.<br><br><b>Curation</b><br><br>After download, sequences longer than 2000 basepairs and sequences containing undetermined bases ('N') were removed. Subsequently, sequences, as well as the reverse-complements of these, were aligned to the archaeal and bacterial SSU profiles from Barrnap (https://github.com/tseemann/barrnap) with hmmalign from HMMER (Eddy 2011). Sequences aligning to fewer than 1000 bases of their respective profile in both forward and reverse-complementary direction were deleted. For the sequences passing the above filters, the longest sequence in each genome was kept. <br>For each species, a maximum of 5 sequences was selected, prioritizing sequences from GTDB species-representative genomes, and longer sequences before shorter. The remaining sequences were then analyzed with Sativa (Kozlov et al. 2016) and sequences misclassified at genus to phylum level were removed. A Perl script for conducting filtering of sequences prior to and after Sativa analysis can be found in the `scripts` folder in the GitHub repo: https://github.com/biodiversitydata-se/sbdi-gtdb. Run perl select_seq_sativa.pl --h for documentation.<br>
提供机构:
Swedish Biodiversity Data Infrastructure (SBDI)
创建时间:
2021-10-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作