five

DataSheet1_A pangenome analysis of ESKAPE bacteriophages: the underrepresentation may impact machine learning models.pdf

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/DataSheet1_A_pangenome_analysis_of_ESKAPE_bacteriophages_the_underrepresentation_may_impact_machine_learning_models_pdf/26074402
下载链接
链接失效反馈
官方服务:
资源简介:
Bacteriophages are the most prevalent biological entities in the biosphere. However, limitations in both medical relevance and sequencing technologies have led to a systematic underestimation of the genetic diversity within phages. This underrepresentation not only creates a significant gap in our understanding of phage roles across diverse biosystems but also introduces biases in computational models reliant on these data for training and testing. In this study, we focused on publicly available genomes of bacteriophages infecting high-priority ESKAPE pathogens to show the extent and impact of this underrepresentation. First, we demonstrate a stark underrepresentation of ESKAPE phage genomes within the public genome and protein databases. Next, a pangenome analysis of these ESKAPE phages reveals extensive sharing of core genes among phages infecting the same host. Furthermore, genome analyses and clustering highlight close nucleotide-level relationships among the ESKAPE phages, raising concerns about the limited diversity within current public databases. Lastly, we uncover a scarcity of unique lytic phages and phage proteins with antimicrobial activities against ESKAPE pathogens. This comprehensive analysis of the ESKAPE phages underscores the severity of underrepresentation and its potential implications. This lack of diversity in phage genomes may restrict the resurgence of phage therapy and cause biased outcomes in data-driven computational models due to incomplete and unbalanced biological datasets.
创建时间:
2024-06-21
二维码
社区交流群
二维码
科研交流群
商业服务