Annotation-free prediction of microbial dioxygen utilization
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Annotation-free_prediction_of_microbial_dioxygen_utilization/26065345
下载链接
链接失效反馈官方服务:
资源简介:
Aerobes require dioxygen to grow; anaerobes do not. But nearly all microbes -- aerobes, anaerobes, and facultative organisms alike -- express enzymes whose substrates include oxygen, if only for detoxification. This presents a challenge when trying to assess which organisms are aerobic from genomic data alone. This challenge can be overcome by noting that oxygen utilization has wide-ranging effects on microbes: aerobes typically have larger genomes encoding distinctive oxygen-utilizing enzymes, for example. These effects permit high-quality prediction of oxygen utilization from annotated genome sequences, with several models displaying ~80% accuracy on a ternary classification task wherein blind guessing is only 33% accurate. Since genome annotation is compute-intensive and relies on many assumptions, we asked if annotation-free methods also perform well. We discovered that simple and efficient models based entirely on genome sequence content -- e.g. triplets of amino acids -- perform as well as intensive annotation-based classifiers, enabling rapid processing of genomes. We further show that amino acid trimers are useful because they encode information about protein composition and phylogeny. To showcase the utility of rapid prediction, we estimated the prevalence of aerobes and anaerobes in diverse natural environments cataloged in the Earth Microbiome Project.Focusing on a well-studied oxygen gradient in the Black Sea, we found quantitative correspondence between local chemistry (oxygen sulfide concentration ratio) and the composition of microbial communities. We therefore suggest that statistical methods like ours might be used to estimate, or "sense,'' pivotal features of the chemical environment using DNA sequencing data.
创建时间:
2024-06-22



