Multiple sequence alignment of plant SOK proteins
收藏DataCite Commons2025-07-30 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/Multiple_sequence_alignment_of_plant_SOK_proteins/28883615/1
下载链接
链接失效反馈官方服务:
资源简介:
This entry includes Supplementary Data for Chapter 5 of the PhD thesis titled "" by Andriy Volkov (Wageningen University). This dataset contains a multiple sequence alignment of 1086 SOSEKI (SOK) proteins from various plants.We compiled a set of SOK sequences from various public databases. Most databases have Pfam/InterPro annotations, which allowed us to obtain proteins containing annotated SOSEKI DIX domains (Pfam ID: PF06136). Where no such annotations existed (<i>Metasequoia glyptostroboides, Lupinus angustifolius and Lycopodium clavatum</i>) we queried the genomes using BLASTP to identify sequences highly homologous to the <i>Arabidopsis </i>SOK1 protein. Since genome sequences are available for few bryophyte species, we supplemented our dataset with additional bryophyte SOK sequences from the OneKP dataset (van Dop et al, 2020). We filtered out sequences that did not match a SOK DIX domain using PFAMScan (Madeira et al., 2024; Mistry et al., 2020), as well as sequences with less than 200 amino acids (median sequence length before filtering: 427 amino acids). The resulting dataset contained 1086 sequences from 199 species. Sequences were aligned using MAFFT L-Ins-2 (Katoh et al., 2017). This multiple sequence alignment (MSA) was used to build a phylogenetic tree.
提供机构:
figshare
创建时间:
2025-04-28



