five

Multigenomic Entropy Based Score (MEBS): The molecular reconstruction of the sulfur cycle

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100357
下载链接
链接失效反馈
官方服务:
资源简介:
The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. <br> We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare and infer complex metabolic pathways in large omic datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used to both: build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2,107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, Receiver Operator Characteristic plots and the Area Under the Curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC=0.985) hard to culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones and metagenomic environments such as hydrothermal vents, or deep-sea sediment. CONCLUSIONS: Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.
提供机构:
GigaScience Database
创建时间:
2017-10-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作