five

An efficient method for measuring the similarity of protein sequences

收藏
Taylor & Francis Group2016-05-26 更新2026-04-16 收录
下载链接:
https://figshare.com/articles/An_efficient_method_for_measuring_the_similarity_of_protein_sequences/3188995/2
下载链接
链接失效反馈
官方服务:
资源简介:
An accurate numerical descriptor for protein sequence is introduced. It is basically a set of each three successive amino acids in the sequence (triplet), starting from left to right, in addition to the distances between each two successive amino acids in the triplet such that the summation of these distances does not exceed 8. This numerical descriptor combines two features the amino acid composition and the position of each amino acid relative to the other nearby amino acids. This numerical descriptor is used to measure the similarity between protein sequences in three sets: NADH dehydrogenase subunit 5 (ND5) proteins of different species, 24 transferrin proteins from vertebrates and 12 proteins of baculoviruses. High correlation coefficient values between our results and the results of ClustalW program are obtained. These values are higher than the values obtained in many other related works.

本文提出了一种用于蛋白质序列的精准数值描述符。其本质为按从左到右顺序选取的序列中每一组连续三个氨基酸(三联体,triplet),同时纳入该三联体内每两个相邻氨基酸间的间距,且所有间距之和不超过8。该数值描述符融合了两项特征:氨基酸组成,以及各氨基酸与邻近其他氨基酸间的相对位置。该数值描述符被用于测算三类蛋白质序列间的相似性:不同物种的烟酰胺腺嘌呤二核苷酸脱氢酶亚基5(NADH dehydrogenase subunit 5,ND5)蛋白、24种脊椎动物转铁蛋白,以及12种杆状病毒蛋白。本方法所得结果与ClustalW程序的输出结果间具有极高的相关系数值,且该相关系数值优于诸多同类已有研究的结果。
提供机构:
A. El-Lakkani; M. Lashin
创建时间:
2016-04-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作