five

Identified clustering methods for automated sequence-based gene family classification.

收藏
NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/_Identified_clustering_methods_for_automated_sequence_based_gene_family_classification_/494849
下载链接
链接失效反馈
官方服务:
资源简介:
Evaluated methods are listed first and the two resulting sub-lists are then sorted chronologically by publication date (older methods first). Methods are categorized according to classification methodology, protein sequence similarity measure, and if and how key challenges of gene family classification are addressed. Distant homologs indicates if and how detection of remote homologs is addressed. Multi-domain indicates if and how the problem of multi-domain proteins and promiscuous domains is addressed. Tree cutting applies to hierarchical clustering techniques only and refers to the functionality of automatically cutting the hierarchical tree of nested clusters into a final, distinct set of putative protein families. Large-scale indicates if larger proteome-scale data sets (>20,000 proteins) can be processed on a desktop computer in reasonable time (hours but not days). Standalone indicates whether the program is available as stand-alone application and can be installed and run locally. Evaluated indicates if the method was amenable for performance evaluation in this paper and why not if otherwise. Abbreviations: n/a: not applicable; n/d: not determined; SL: single-linkage; CL: clustering; LR: logistic regression; NN: neural network; SW: Smith-Waterman; MCL: Markov clustering; HMM: hidden markov model.
创建时间:
2010-10-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作