Tetranucleotide frequencies and ratios of frequencies
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.tb2rbp0c8
下载链接
链接失效反馈官方服务:
资源简介:
Microbiomes are constrained by physicochemical conditions, nutrient regimes, and community interactions across diverse environments, yet genomic signatures of this adaptation remain unclear. Metagenome sequencing is a powerful technique to analyze genomic content in the context of natural environments, establishing concepts of microbial ecological trends. Here, we developed a data discovery tool - a tetranucleotide-informed metagenome stability diagram - that is publicly available in the Integrated Microbial Genomes and Microbiomes (IMG/M) platform for metagenome-ecosystem analyses. We analyzed the tetranucleotide frequencies from quality-filtered and unassembled sequence data of over 12,000 metagenomes to assess ecosystem-specific microbial community composition and function. We found that tetranucleotide frequencies can differentiate communities across various natural environments, and that specific functional and metabolic trends can be observed in this structuring. Our tool places metagenomes sampled from diverse environments into clusters and along gradients of tetranucleotide frequency similarity, suggesting microbiome community compositions specific to gradient conditions. Within the resulting metagenome clusters, we identify protein-coding gene identifiers that are most differentiated between ecosystem classifications. We plan for annual updates to the metagenome stability diagram in IMG/M with new data, allowing for refinement of the ecosystem classifications delineated here. This framework has the potential to inform future studies on microbiome engineering, bioremediation, and the prediction of microbial community responses to environmental change.
Methods
Metagenome datasets, filtered sequencing reads following the standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (Huntemann et al. 2016) were analyzed for tetranucleotide counts with KMC (version 3.1.1) (Kokot, Długosz, and Deorowicz 2017). Tetranucleotide counts were converted into frequencies (e.g. 4-mer count divided by the total count of all 4-mers) and ratios of frequencies (e.g. 4-mer frequencies divided by all other permutations of 4-mer frequencies).
创建时间:
2025-01-03



