Supporting data for "Interpreting k-mer based signatures for antibiotic resistance prediction"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100783
下载链接
链接失效反馈官方服务:
资源简介:
Recent years witnessed the development of several k-mer-based approaches aiming to predict phenotypic traits of bacteria based on their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally hard to decipher.<br>We propose a simple and computationally efficient strategy allowing one to cope with the high correlation inherent to k-mer-based representations in supervised machine learning models, leading to concise and easily interpretable signatures. We demonstrate the benefit of this approach on the task of predicting the antibiotic resistance profile of a <i>Klebsiella pneumoniae</i> strain from its genome, where our method leads to signatures defined as weighted linear combinations of genetic elements that can easily be identified as genuine antibiotic resistance determinants, with state of the art predictive performance.<br>By enhancing the interpretability of genomic k-mer-based antibiotic resistance prediction models, our approach improves their clinical utility, hence will facilitate their adoption in routine diagnostics by clinicians and microbiologists. While antibiotic resistance was the motivating application, the method is generic and can be transposed to any other bacterial trait. An R package implementing our method is available on GitLab.
提供机构:
GigaScience Database
创建时间:
2020-08-26



