five

Model performance metrics on the testing dataset.

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Model_performance_metrics_on_the_testing_dataset_/30545744
下载链接
链接失效反馈
官方服务:
资源简介:
Background H3 influenza A viruses (IAV) have been shown to frequently cross the species barrier which can be an important factor in sustained transmission and spread. Machine learning methods have been widely explored for host prediction of IAV using genomic data; however, this is often done using data from only one of the eight IAV segments or by using all available IAV data to predict broad categories of hosts. Objective The objective of this study was to combine machine learning algorithms with H3 IAV sequence data from all eight segments to train predictive machine learning models for distinct host prediction and validate model performance. Methods Models were trained on both k-mers and amino acid properties alongside machine learning algorithms that included random forest and XGBoost for each of the eight IAV genome segments. Models were then validated on a test dataset through analytics of model class predicted probabilities and subsequently used to investigate between-species transmission patterns within case studies including canine H3N8, swine H3N2 2010.2, and duck H3 sequences. Results Models demonstrated strong performance in host prediction across all eight segments on the test dataset, with overall accuracies and κ (kappa) values ranging from 0.995–0.997, 0.984–0.990, respectively. Misclassified test dataset sequences with high predicted probabilities (> 90%) were validated using available literature and were identified to be frequently associated with between-species transmission events. Between-species transmission patterns within case study model class predicted probabilities were also identified to be consistent with the literature in cases of both correct and incorrect classification. Conclusions These models allow for rapid and accurate host prediction of H3 IAV datasets from any of the eight IAV segments and provide a solid framework that allows for identification of variants with higher than typical between-species transmission potential. However, results obtained on selected case studies suggest further improvements of the training and validation processes should be considered.
创建时间:
2025-11-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作