five

Reversible auto-encoding of amino-acid residues in reduced space: an application to predicting DNA-binding proteins

收藏
bridges.monash.edu2017-11-21 更新2025-03-26 收录
下载链接:
https://bridges.monash.edu/articles/dataset/Reversible_auto-encoding_of_amino-acid_residues_in_reduced_space_an_application_to_predicting_DNA-binding_proteins/5619529/1
下载链接
链接失效反馈
官方服务:
资源简介:
There have been a number of recent studies aiming to predict binding sites and other structural and sequence features of proteins using local amino acid sequence as inputs to a machine learning system. This requires representing amino acids in numerical space, which is typically 20 bits per residue. Number of trainable parameters significantly becomes large with the addition of each neighbor information and hence the application of the technique becomes restricted to the prediction of properties for which large amounts of data is available. Thus, there is a need to find alternatives to this type of sparse encoding. Here a method of auto encoding 20-dimensional sparse representation into lower dimensional space is developed with amino-acids in perspective- although the method is general. It is shown that 20-bit sparse encoding could be reduced to 6-dimensional real space without loss of information and to even lower dimensions with varying degrees of information loss. An application to predicting DNA-binding sites was tested to assess the validity of the proposed method and it was observed that auto-encoded neural network prediction was comparable or superior to sparse encoding system. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1 Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

近期,诸多研究致力于利用局部氨基酸序列作为机器学习系统的输入,以预测蛋白质的绑定位点及其他结构和序列特征。此过程需将氨基酸表示为数值空间中的点,通常每个残基占用20位比特。随着邻域信息的增加,可训练参数的数量显著增多,因此该技术的应用范围受到限制,仅限于那些具有大量数据可用的属性预测。因此,有必要寻找替代这种稀疏编码的方法。本研究开发了一种将20维稀疏表示自动编码至低维空间的算法,以氨基酸为研究对象——尽管该方法具有通用性。研究表明,20位比特的稀疏编码可以无损地降至6维实空间,甚至可以降至更低维度,而信息损失的程度可变。该方法在预测DNA结合位点中的应用得到了测试,以评估所提方法的有效性,结果显示,自动编码的神经网络预测与稀疏编码系统相当甚至更优。PRIB 2008会议论文集可在以下链接找到:http://dx.doi.org/10.1007/978-3-540-88436-1
提供机构:
bridges.monash.edu
二维码
社区交流群
二维码
科研交流群
商业服务