List of tested HA sequences.

Figshare2025-08-14 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/List_of_tested_HA_sequences_/29915609

下载链接

链接失效反馈

官方服务：

资源简介：

Influenza viruses are known to evade host immune responses by shielding vulnerable surface protein epitopes via N-linked glycosylation. A program titled Future Sequon Finder was developed to predict the locations in which glycan binding sites are most likely to emerge in future influenza hemagglutinin proteins. The predictive modeling approach considers how closely sites in currently circulating strains resemble glycosylation sequons at the nucleic acid level, the surface accessibility of those sites, and the mutation frequency of amino acids at those sites that would need to change to form a glycosylation sequon. The efficacy of this model is tested using historic human H1N1 and H3N2 influenza strains along with swine H1N1 strains. Through this analysis, it is revealed that glycosylation addition events in influenza hemagglutinin proteins are typically the result of single nucleotide mutation events. It is also demonstrated that site-specific mutation frequency and surface accessibility are powerful predictors of which sites will become glycosylated in human influenza viruses when considered with the genetic composition of the sites in question. Having been designed to incorporate these factors, the program successfully predicted almost every historic sequon addition event (28/30 in human IFVs, 14/15 in swine IFVs). For human strains, it also ranked the correct near-sequons highly among falsely predicted sequons based on site-specific mutation frequency. After demonstrating the model’s power with historical data, the program was used to predict future HA glycosylation sequon locations based on currently circulating human influenza viruses.

已知流感病毒可通过N-连接糖基化（N-linked glycosylation）遮蔽易感的表面蛋白表位，从而逃逸宿主免疫应答。本研究开发了一款名为未来糖基化位点预测工具（Future Sequon Finder）的程序，用于预测未来流感血凝素（hemagglutinin，HA）蛋白中最有可能出现糖基结合位点的位置。该预测建模方法综合考量了三类因素：当前流行毒株中的位点在核酸层面与糖基化共有序列（glycosylation sequons）的相似程度、这些位点的表面可及性，以及为形成糖基化共有序列所需发生的氨基酸突变频率。本研究使用历史人类H1N1、H3N2流感毒株以及猪源H1N1流感毒株对该模型的效能进行了验证。经此分析可知，流感血凝素蛋白的糖基化新增事件通常由单核苷酸突变所导致。研究同时证实，若结合待考察位点的遗传组成信息，位点特异性突变频率与表面可及性可有效预测人类流感病毒中哪些位点将发生糖基化修饰。由于该程序整合了上述三类因素，其成功预测了几乎所有历史上已发生的糖基化共有序列新增事件（人类流感病毒（IFVs）中为28/30，猪源流感病毒中为14/15）。针对人类毒株，该程序还基于位点特异性突变频率，将真实存在的近糖基化共有序列在假阳性预测位点中排在了较高位置。在通过历史数据验证了模型的效能后，该程序基于当前流行的人类流感毒株，预测了未来可能出现的血凝素糖基化共有序列位点。

创建时间：

2025-08-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集