five

A domain classification with an ensemble of HMMs.

收藏
Figshare2015-12-02 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/_A_domain_classification_with_an_ensemble_of_HMMs_/685769
下载链接
链接失效反馈
官方服务:
资源简介:
The first column lists the different substrates and the number of sequences analyzed (between brackets). The second column lists the number of correctly classified sequences by our ensemble of HMMs, for the non-redundant reference dataset of experimentally validated substrate specific A domain sequences collected from reference databases, literature and from [43] (set K = 571 sequences). The third column gives the number of sequences that received a false annotation (f), and the fourth column gives the number of sequences that scored above treshold (at, grey and numbers in italics). Columns five, six and seven provide the same information but then related to the Leave One Out cross validation.$ See the legend of Figure 3 for the systematic name of the various substrates. The category ‘other’ includes those substrates that are represented only once in the domain sequence dataset. They include: 2-oxo-isovaleric-acid, 3-methyl-glutamate (3-me-glu), 4-propyl-proline (4ppro), 2-amino-9,10-epoxy-8-oxodecanoic acid (aeo), alaninol, alle, alpha-hydroxy-isocaproic acid, an, (S)-2-amino-8-oxodecanoic acid (aoda), l-capreomycidine (cap), d-lysergic acid (d-lyserg), hydroxyl-asn, hmp-D, LDAP, MeHOval, N-methyl-phenylalanine (mephe), N-methyl valine (meval), N-(1,1-dimethyl-1-allyl)tryptophan, phenyl-glycine (phg), s-nmethoxy-tryptophan, (4S)-5,5,5-trichloro-leucine (tcl), valinol (vol).*, # and &: For particular substrates no representative models were present in one or more of the predictors that were compared in Table 3 (*, NRPSsp; #, NRPS predictor 2; &, ensemble HMMs). Ideally the related sequences should obtain a score above threshold.
创建时间:
2015-12-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作