five

FAPM: Functional annotation of proteins using multi-modal models beyond structural modeling

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.m905qfv9p
下载链接
链接失效反馈
官方服务:
资源简介:
Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and “tail labels” with few known examples. Unlike previous methods that mainly focused on protein sequence features, we use a pretrained large natural language model to understand the semantic meaning of protein labels. Specifically, we introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation.
创建时间:
2024-07-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作