five

Phylogeny of Arabidopsis eukaryotic kinases

收藏
DataONE2014-07-10 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogeny of 942 eukaryotic kinase domains of Arabidopsis proteins. An alignment of 491 eukaryotic protein kinases was downloaded on Feb 9, 2012 from http://kinase.com/human/kinome/phylogeny.html, and this alignment was used to compute a profile Hidden Markov Model (HMM) using the software HMMer [12]. All representative gene models from Arabidopsis thaliana (TAIR10_pep) were searched against the profile HMM using HMMer. In total, 1,045 sequences generated hits with an E-value lower than 0.01. These sequences were then aligned to the profile HMM. Two sequences (AT1G11300.1 and AT2G32800.1) each had two distinct kinase domains and both domains per gene were therefore included as separate sequences in the alignment, resulting in an alignment of 1,047 distinct Arabidopsis kinase domains (additional File 5). All alignment positions not part of the profile HMM were removed from the alignment. In addition, all sequences covering less than 70% of the profileHMM were removed using the software REAP [13]. The cutoff value of 70% corresponded to a threshold value in sequence coverage distribution with sharp decline of sequence coverage for 111 kinase domains below coverage of 70% (additional File 8). The final alignment then consisted of 317 columns from 942 sequences (kinase domains) and was used to compute a maximum likelihood phylogeny and 100 bootstrap replicates using the PROTCATWAG model of the RAxML program [14].

本数据集聚焦拟南芥蛋白中942个真核激酶结构域的系统发育分析。2012年2月9日,从http://kinase.com/human/kinome/phylogeny.html 下载了491个真核蛋白激酶的多序列比对结果,并借助HMMer软件[12]基于该比对构建了隐马尔可夫模型轮廓(profile Hidden Markov Model, HMM)。以拟南芥(Arabidopsis thaliana)的全部代表性基因模型(TAIR10_pep)为检索对象,使用HMMer与该HMM轮廓进行序列比对检索,共获得1045条E值低于0.01的命中序列。随后将这些序列与该HMM轮廓进行多序列联配。其中两条序列(AT1G11300.1与AT2G32800.1)各自包含两个独立的激酶结构域,因此将每个基因的两个结构域均作为独立序列纳入联配,最终得到包含1047个非重复拟南芥激酶结构域的多序列比对结果(附加文件5)。移除所有不属于该HMM轮廓的联配位点;同时使用REAP软件[13]过滤覆盖度低于HMM轮廓70%的序列:该70%的截断阈值对应序列覆盖度分布的临界值,此时有111个激酶结构域的覆盖度不足70%,序列覆盖度在此处出现显著下降(附加文件8)。最终的多序列比对包含来自942个激酶结构域的317个位点,使用RAxML程序的PROTCATWAG模型[14]构建最大似然系统发育树,并生成100组自举重复样本。
创建时间:
2014-07-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作