Protein Large Language Models Can Predict Flavivirus Protease Target Specificity

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Protein_Large_Language_Models_Can_Predict_Flavivirus_Protease_Target_Specificity/31910617

下载链接

链接失效反馈

官方服务：

资源简介：

Viral proteases are essential enzymes in many viral strains, playing a crucial role in the viral replication cycle. They are key targets for antiviral drug development and have significant implications for viral pathogenesis. To address the issue of Flavivirus protease substrate promiscuity, Yellow Fever virus protease (YFP), West Nile Virus Protease (WNP), Zika virus protease (ZVP), Usutu Virus Protease (UVP), and Rocio Virus Protease (RVP) were recombinantly expressed in E. coli BL21(DE3) and purified. Mass spectrometric Proteomic Identification of protease Cleavage Sites (PICSs) was performed using peptide libraries derived from a murine cell line lysate. A surprisingly high promiscuity in protease substrate specificity was detected for all five viral proteases, with a recurrence of arginine in the P1 position. Using homology modeling, specific subsites could be identified. However, the promiscuity of peptide binding was difficult to elucidate using these models. For these reasons, the ProtTrans protein language model (pLM) was used and fine-tuned with the obtained peptide sequences. The ProtTrans T5-Encoder model, originally trained to predict same protein-chain amino acids using a huge size of protein sequence data, when fine-tuned with target peptides from the PICS experiments and decoy peptides, could classify each of these groups with up to 76% test-set accuracy. Dimensionality reduction indicated that the T5 embeddings could indeed contain similar information, which was useful for recognizing protein–peptide interactions. These results confirm the usefulness of pLMs for the prediction of protein–protein interactions and thus have important implications for antiviral drug design.

创建时间：

2026-04-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集