FAPM: Functional annotation of proteins using multi-modal models beyond structural modeling
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.m905qfv9p
下载链接
链接失效反馈官方服务:
资源简介:
Assigning accurate property labels to proteins, like functional terms and
catalytic activity, is challenging, especially for proteins without
homologs and “tail labels” with few known examples. Unlike previous
methods that mainly focused on protein sequence features, we use a
pretrained large natural language model to understand the semantic meaning
of protein labels. Specifically, we introduce FAPM, a contrastive
multi-modal model that links natural language with protein sequence
language. This model combines a pretrained protein sequence model with a
pretrained large language model to generate labels, such as Gene Ontology
(GO) functional terms and catalytic activity predictions, in natural
language. Our results show that FAPM excels in understanding protein
properties, outperforming models based solely on protein sequences or
structures. It achieves state-of-the-art performance on public benchmarks
and in-house experimentally annotated phage proteins, which often have few
known homologs. Additionally, FAPM's flexibility allows it to
incorporate extra text prompts, like taxonomy information, enhancing both
its predictive performance and explainability. This novel approach offers
a promising alternative to current methods that rely on multiple sequence
alignment for protein annotation.
提供机构:
Dryad
创建时间:
2024-07-16



