five

Sequence database for the single-copy, nuclear-encoded, core photosynthetic gene psbO

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.omicsdi.org/dataset/biostudies-other/S-BSST659
下载链接
链接失效反馈
官方服务:
资源简介:
We have compiled a psbO sequence database for its use as phytoplankton marker gene (see Pierella Karlusich et al 2023 Molecular Ecology Resources doi:10.1111/1755-0998.13592). psbO is nuclear-encoded and only present in photosynthetic organisms (both cyanobacteria and eukaryotic phototrophs), mainly in one copy per genome. The database contains >18,000 unique psbO sequences covering cyanobacteria, photosynthetic protists, macroalgae and land plants. It includes sequences retrieved from IMG, NCBI, MMETSP and other sequenced genomes and transcriptomes, as well as from the environmental sequence catalogs of Global Ocean Sampling and Tara Oceans. The taxonomic assignment of environmental sequences of psbO was determined by the placement of their translated sequences on a PsbO protein reference phylogeny. This reference phylogeny was built in the following way. The sequences were retrieved using HMMer version 3.2.1 with gathering threshold option (http://hmmer.org/) for the corresponding Pfam domain (MSP; PF01716) against the translated sequenced genomes and transcriptomes from the literature and from PhycoCosm, MMETSP and IMG databases. The translated Pfam region of each sequence was retrieved and the redundancy of the dataset was reduced using CDHIT version 4.6.4 (W. Li & Godzik, 2006) at a 80% identity cut-off. These translated sequences were then aligned with MAFFT version 6 using the G-INS-I strategy (Katoh & Toh, 2008). The reference phylogenetic trees was generated with PhyML version 3.0 using the LG substitution model plus gamma-distributed rates and four substitution rate categories (Guindon et al., 2010). The starting tree was a BIONJ tree and the type of tree improvement was subtree pruning and regrafting. Branch support was calculated using the approximate likelihood ratio test (aLRT) with a Shimodaira–Hasegawa-like (SH-like) procedure. Contaminant sequences were carefully removed based on phylogenetic incongruence. The corresponding curated final alignment was used as reference. For parallelization of the taxonomic annotation task, a set of 50 environmental sequences were translated and the PsbO specific Pfam region (PF01716) were retrieved for the following analysis. First, they were aligned against the reference alignment using the option --add of MAFFT version 6 with the G-INS-I strategy (Katoh and Toh 2008 Brief Bioinformatics 9:286-298). Second, the resulting alignment was used for building a phylogeny as described above. Finally, the sequences were classified according to their grouping in monophyletic branches of statistical support >0.7 with reference sequences of the same taxonomic group.
创建时间:
2023-12-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作