ProtNote: a multimodal method for protein-function annotation
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13897919
下载链接
链接失效反馈官方服务:
资源简介:
Understanding protein sequence-function relationships is essential for advancing protein biology and engineering. However, fewer than 1% of known protein sequences have human-verified functions, and scientists continually update the set of possible functions. While deep learning methods have demonstrated promise for protein function prediction, current models are limited to predicting only those functions on which they were trained. Here, we introduce ProtNote, a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its train set, but also generalizes to unseen and novel functions in zero-shot test settings. We envision that ProtNote will enhance protein function discovery by enabling scientists to use free text inputs, without restriction to predefined labels – a necessary capability for navigating the dynamic landscape of protein biology.
阐明蛋白质序列与功能之间的关联,对于推动蛋白质生物学研究与工程化发展至关重要。然而,目前已知的蛋白质序列中,经人工验证功能的占比不足1%,且科学家仍在持续扩充潜在功能的集合范畴。尽管深度学习方法在蛋白质功能预测领域已展现出应用潜力,但现有模型仅能预测其训练阶段接触过的功能类型,存在明显局限。本文提出ProtNote——一种依托自由格式文本的多模态深度学习模型,可实现监督学习与零样本(zero-shot)蛋白质功能预测。ProtNote不仅在训练集的功能注释任务中接近当前最优性能,还能在零样本测试场景下泛化至未见过的全新功能类别。我们预期,ProtNote将助力科学家通过自由文本输入开展蛋白质功能探索,无需受限于预定义标签——这正是适配动态演化的蛋白质生物学研究格局的核心能力。
创建时间:
2024-10-13



