five

Biased sampling confounds machine learning prediction of antimicrobial resistance

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/zs2mbjv7dn
下载链接
链接失效反馈
官方服务:
资源简介:
Antimicrobial resistance (AMR) poses a growing threat to human health. Increasingly, genome sequencing is being applied for surveillance of bacterial pathogens, producing a wealth of data to train machine learning (ML) applications to predict AMR and identify resistance determinants. However, bacterial populations are highly structured and sampling is biased towards human disease isolates, meaning samples and derived features are not independent. This is rarely considered in applications of ML to AMR. Here, we demonstrate the confounding effects of sample structure by collecting over 24,000 whole genome sequences and AMR phenotypes from five diverse pathogens and constructing realistic pathological training data where resistance is confounded with phylogeny. We show resulting ML models perform poorly, and increasing the training sample size fails to rescue performance. A comprehensive analysis of 6,740 models identifies species- and drug-specific effects on model accuracy. We provide concrete recommendations for evaluating future ML approaches to AMR.
创建时间:
2025-10-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作