Data Sheet 1_In silico performance of a targeted enriched metagenomics approach to infer Mycoplasma bovis strains in milk.pdf
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_In_silico_performance_of_a_targeted_enriched_metagenomics_approach_to_infer_Mycoplasma_bovis_strains_in_milk_pdf/31804177
下载链接
链接失效反馈官方服务:
资源简介:
Strain variation plays a key role in the microbial epidemiology of Mycoplasma bovis, yet its true diversity remains incompletely characterized, partly due to limitations of culture-based methods. This study evaluated the in silico suitability of a targeted enrichment (TE) shotgun sequencing approach to detect and classify M. bovis strains in milk metagenomic samples. As a proof of concept, the accuracy of this approach was assessed using milk-derived M. bovis strains. A total of 620 M. bovis whole-genome sequences were downloaded from NCBI, of which 162 (26.1%) originated from milk samples. Genomes were grouped into Genomically Clustered Sequence Variants (GSVs) using MashTree and TreeCluster to enable strain-level classification. To simulate TE sequencing data, genomes from different milk-associated GSVs were randomly selected and fragmented in silico into 150-bp reads. Mock milk samples were generated by sampling reads with replacement from these genomes. Sequencing depth was modeled using a Poisson distribution, while mixed-strain DNA samples were simulated by including 1, 3, 6, or 9 GSVs per sample. Enrichment proportions were set at 0.3, 0.5, 0.7, and 0.9. Two classification tools, Kraken2 and Themisto/mSWEEP, were evaluated for their ability to detect and classify the simulated TE reads. Themisto/mSWEEP consistently outperformed Kraken2, achieving an average read classification accuracy of 84.9% compared with 1.4% for Kraken2. Sensitivity for Themisto/mSWEEP was 100% with a single spiked GSV and declined slightly to 97.0% with nine GSVs, whereas Kraken2 achieved sensitivities of only 17.3% and 4.7%, respectively. Positive predictive value (PPV) showed a similar pattern: 98% for Themisto/mSWEEP vs. 4.7% for Kraken2 with a single GSV, and 65.5% vs. 10% with nine GSVs. While Kraken2's PPV increased slightly with additional GSVs, Themisto/mSWEEP's PPV decreased. Both methods maintained high specificity and negative predictive value (>91%) across all scenarios. Enrichment proportion had no measurable effect on performance. Overall, Themisto/mSWEEP demonstrated superior accuracy for GSV-level identification of M. bovis strains. Enrichment to at least 30% of total reads was sufficient to recover strain-level data. Further work is needed to assess the biological relevance and practical applications of these genomic clusters.
创建时间:
2026-03-18



