Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
收藏PubMed Central2002-01-10 更新2026-05-16 收录
下载链接:
https://pmc.ncbi.nlm.nih.gov/articles/PMC65048/
下载链接
链接失效反馈官方服务:
资源简介:
BACKGROUND: Most profile and motif databases strive to classify protein sequences into a broad spectrum of protein families. The next step of such database studies should include the development of classification systems capable of distinguishing between subfamilies within a structurally and functionally diverse superfamily. This would be helpful in elucidating sequence-structure-function relationships of proteins. RESULTS: Here, we present a method to diagnose sequences into subfamilies by employing hidden Markov models (HMMs) to find windows of residues that are distinct among subfamilies (called signatures). The method starts with a multiple sequence alignment (MSA) of the subfamily. Then, we build a HMM database representing all sliding windows of the MSA of a fixed size. Finally, we construct a HMM histogram of the matches of each sliding window in the entire superfamily. To illustrate the efficacy of the method, we have applied the analysis to find subfamily signatures in two well-studied superfamilies: the cadherin and the EF-hand protein superfamilies. As a corollary, the HMM histograms of the analyzed subfamilies revealed information about their Ca(2+) binding sites and loops. CONCLUSIONS: The method is used to create HMM databases to diagnose subfamilies of protein superfamilies that complement broad profile and motif databases such as BLOCKS, PROSITE, Pfam, SMART, PRINTS and InterPro.
提供机构:
BMC
创建时间:
2002-01-10



