five

Supporting data and code for: Phylogenetic identification of influenza virus candidates for seasonal vaccines

收藏
DataONE2023-12-18 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:00477cd70028c85981367cd2719c47c351ec0e41651f5553f4f7e6f83213cc9e
下载链接
链接失效反馈
官方服务:
资源简介:
The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75–0.89 (Area under the curve AUC 0.83–0.91) over 2016–2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent three years also do well at this task. We identify similar candidate strains to those proposed by the World Health Organization, suggesting that this approach can help inform vaccine strain selection., This repository contains the code, data and materials developed for 'Phylogenetic identification of influenza virus candidates for seasonal vaccines'. We downloaded all hemagglutinin (HA) and neuraminidase (NA) human H3N2 sequences collected from 1980 to February 2020 from the Global Initiative on Sharing Avian Influenza Data (GISAID). Accession numbers and references to the GISAID submitting laboratories for the sequences used in this study are included in this repository. As per GISAID access terms, the sequences used in this study are not reproduced here but may be downloaded from the GISAID server.This repository contains all code to compute the features, train and test the machine learning models, predict the next year's flu vaccine candidates, and generate the plots for the paper. Derived data e.g. all the influenza trees for the experiments in years 2016 to 2020 are included. , All data files are in CSV format. All code was written in R (open-source), and influenza trees are included in RDATA files to be read into R. Accession numbers and references to the GISAID submitting laboratories for the sequences used in this study are included in a zip folder. To recreate the analysis in full, these accession numbers may be used to download the influenza sequences directly from GISAID. , # Phylogenetic identification of influenza virus candidates for seasonal vaccines - supporting data and code --- This repository contains the code, data and materials developed for 'Phylogenetic identification of influenza virus candidates for seasonal vaccines'. The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75-0.89 (Area under the curve AUC 0.83-0.91) over 2016-2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent three y...
创建时间:
2025-07-25
二维码
社区交流群
二维码
科研交流群
商业服务