five

Supporting data for "Learning A Generalized Graph Transformer for Protein Function Prediction in Dissimilar Sequences"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/102588
下载链接
链接失效反馈
官方服务:
资源简介:
In the face of a growing disparity between high-throughput sequence data and low-throughput experimental studies, the emerging field of deep learning stands as a promising alternative. Generally, many data-driven approaches are capable of facilitating fast and accurate predictions of protein functions. Nevertheless, the inherent statistical nature of deep learning techniques may limit their generalization capabilities when applied to novel non-homologous proteins that diverge significantly from existing ones. <br>In this work, we propose a novel, generalized approach named Graph Adversarial Learning with Alignment (GALA) for protein function prediction. Our GALA model integrates a graph transformer architecture with an attention pooling module to extract information from both protein sequences and structures, facilitating unified learning of protein structural representations. Particularly noteworthy, GALA incorporates a domain discriminator conditioned on both representations and predicted probabilities, which undergoes adversarial training to ensure representation invariance across diverse environments. To optimize the model with abundant label information, we generate label embeddings in the hidden space, explicitly aligning them with protein representations. Benchmarked on datasets derived from PDB database and Swiss-Prot database, our GALA achieves performance comparable to several state-of-the-art methods. Furthermore, GALA demonstrates outstanding interpretability by identifying key functional residues associated with GO terms through class activation mapping. <br>GALA, which leverages adversarial learning and label embedding alignment to acquire domain-invariant protein representations, exhibits outstanding generalizability in function prediction for proteins from previously unseen sequence space. By utilizing the structures predicted by AlphaFold2, GALA holds significant potential for function annotation in newly discovered sequences. Implementations of our GALA can be found at https://github.com/fuyw-aisw/GALA.
提供机构:
GigaScience Database
创建时间:
2024-10-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作