Code and data for CFreeEnS
收藏DataCite Commons2025-06-10 更新2025-04-16 收录
下载链接:
https://researchdata.ntu.edu.sg/citation?persistentId=doi:10.21979/N9/4YDZED
下载链接
链接失效反馈官方服务:
资源简介:
A method called Context-Free Encoding Scheme (CFreeEnS) was proposed to encode protein sequence pairs into a numeric matrix. CFreeEnS takes advantage of rich information about the physiochemical and structural properties of amino acids. This encoding scheme keeps information about conserved properties of amino acids, which makes it possible for learning methods (e.g. random forest) to capture the cross-subtype antigenic pattern of influenza viruses. Besides, the CFreeEnS, free from dependence on carefully designed features, should be applicable to other applications in bioinformatics measuring the phenotype similarity from sequences. We have tested the method on four more datasets, namely the iAMP-2L dataset classifying antimicrobial peptides from non-antimicrobial peptides [5]; the tumor homing peptides dataset (TumorHPD); the HemoPI including hemolytic, non-hemolytic and semi-hemolytic peptides and the phage virion proteins. The predicting accuracy of 10-fold cross validation is compared with two reported methods. Results show that the CFreeEnS outperforms or at least is competitive with the traditional method using handcrafted features and a state-of-art method named m-NGSG.
提供机构:
DR-NTU (Data)
创建时间:
2019-04-03



