Protein Large Language Models Can Predict Flavivirus Protease Target Specificity
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Protein_Large_Language_Models_Can_Predict_Flavivirus_Protease_Target_Specificity/31910617
下载链接
链接失效反馈官方服务:
资源简介:
Viral proteases are
essential enzymes in many viral strains,
playing
a crucial role in the viral replication cycle. They are key targets
for antiviral drug development and have significant implications for
viral pathogenesis. To address the issue of Flavivirus protease substrate promiscuity, Yellow Fever virus protease (YFP),
West Nile Virus Protease (WNP), Zika virus protease (ZVP), Usutu Virus
Protease (UVP), and Rocio Virus Protease (RVP) were recombinantly
expressed in E. coli BL21(DE3) and
purified. Mass spectrometric Proteomic Identification of protease
Cleavage Sites (PICSs) was performed using peptide libraries derived
from a murine cell line lysate. A surprisingly high promiscuity in
protease substrate specificity was detected for all five viral proteases,
with a recurrence of arginine in the P1 position. Using homology modeling,
specific subsites could be identified. However, the promiscuity of
peptide binding was difficult to elucidate using these models. For
these reasons, the ProtTrans protein language model (pLM) was used
and fine-tuned with the obtained peptide sequences. The ProtTrans
T5-Encoder model, originally trained to predict same protein-chain
amino acids using a huge size of protein sequence data, when fine-tuned
with target peptides from the PICS experiments and decoy peptides,
could classify each of these groups with up to 76% test-set accuracy.
Dimensionality reduction indicated that the T5 embeddings could indeed
contain similar information, which was useful for recognizing protein–peptide
interactions. These results confirm the usefulness of pLMs for the
prediction of protein–protein interactions and thus have important
implications for antiviral drug design.
创建时间:
2026-04-01



