Supporting data and code for: Phylogenetic identification of influenza virus candidates for seasonal vaccines
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.x95x69pqh
下载链接
链接失效反馈官方服务:
资源简介:
The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75–0.89 (Area under the curve AUC 0.83–0.91) over 2016–2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent three years also do well at this task. We identify similar candidate strains to those proposed by the World Health Organization, suggesting that this approach can help inform vaccine strain selection.
Methods
This repository contains the code, data and materials developed for 'Phylogenetic identification of influenza virus candidates for seasonal vaccines'. We downloaded all hemagglutinin (HA) and neuraminidase (NA) human H3N2 sequences collected from 1980 to February 2020 from the Global Initiative on Sharing Avian Influenza Data (GISAID). Accession numbers and references to the GISAID submitting laboratories for the sequences used in this study are included in this repository. As per GISAID access terms, the sequences used in this study are not reproduced here but may be downloaded from the GISAID server.
This repository contains all code to compute the features, train and test the machine learning models, predict the next year's flu vaccine candidates, and generate the plots for the paper. Derived data e.g. all the influenza trees for the experiments in years 2016 to 2020 are included.
创建时间:
2023-12-18



