Identified clustering methods for automated sequence-based gene family classification.
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/_Identified_clustering_methods_for_automated_sequence_based_gene_family_classification_/494849
下载链接
链接失效反馈官方服务:
资源简介:
Evaluated methods are listed first and the two resulting sub-lists are then sorted chronologically by publication date (older methods first). Methods are categorized according to classification methodology, protein sequence similarity measure, and if and how key challenges of gene family classification are addressed. Distant homologs indicates if and how detection of remote homologs is addressed. Multi-domain indicates if and how the problem of multi-domain proteins and promiscuous domains is addressed. Tree cutting applies to hierarchical clustering techniques only and refers to the functionality of automatically cutting the hierarchical tree of nested clusters into a final, distinct set of putative protein families. Large-scale indicates if larger proteome-scale data sets (>20,000 proteins) can be processed on a desktop computer in reasonable time (hours but not days). Standalone indicates whether the program is available as stand-alone application and can be installed and run locally. Evaluated indicates if the method was amenable for performance evaluation in this paper and why not if otherwise. Abbreviations: n/a: not applicable; n/d: not determined; SL: single-linkage; CL: clustering; LR: logistic regression; NN: neural network; SW: Smith-Waterman; MCL: Markov clustering; HMM: hidden markov model.
创建时间:
2010-10-15



