five

Multiple sequence alignment of plant SOK proteins

收藏
Figshare2025-07-30 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Multiple_sequence_alignment_of_plant_SOK_proteins/28883615/1
下载链接
链接失效反馈
官方服务:
资源简介:
This entry includes Supplementary Data for Chapter 5 of the PhD thesis titled "" by Andriy Volkov (Wageningen University). This dataset contains a multiple sequence alignment of 1086 SOSEKI (SOK) proteins from various plants.We compiled a set of SOK sequences from various public databases. Most databases have Pfam/InterPro annotations, which allowed us to obtain proteins containing annotated SOSEKI DIX domains (Pfam ID: PF06136). Where no such annotations existed (<i>Metasequoia glyptostroboides, Lupinus angustifolius and Lycopodium clavatum</i>) we queried the genomes using BLASTP to identify sequences highly homologous to the <i>Arabidopsis </i>SOK1 protein. Since genome sequences are available for few bryophyte species, we supplemented our dataset with additional bryophyte SOK sequences from the OneKP dataset (van Dop et al, 2020). We filtered out sequences that did not match a SOK DIX domain using PFAMScan (Madeira et al., 2024; Mistry et al., 2020), as well as sequences with less than 200 amino acids (median sequence length before filtering: 427 amino acids). The resulting dataset contained 1086 sequences from 199 species. Sequences were aligned using MAFFT L-Ins-2 (Katoh et al., 2017). This multiple sequence alignment (MSA) was used to build a phylogenetic tree.
提供机构:
Volkov, Andriy
创建时间:
2025-04-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作