five

Tetranucleotide frequencies and ratios of frequencies

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.tb2rbp0c8
下载链接
链接失效反馈
官方服务:
资源简介:
Microbiomes are constrained by physicochemical conditions, nutrient regimes, and community interactions across diverse environments, yet genomic signatures of this adaptation remain unclear. Metagenome sequencing is a powerful technique to analyze genomic content in the context of natural environments, establishing concepts of microbial ecological trends. Here, we developed a data discovery tool - a tetranucleotide-informed metagenome stability diagram - that is publicly available in the Integrated Microbial Genomes and Microbiomes (IMG/M) platform for metagenome-ecosystem analyses. We analyzed the tetranucleotide frequencies from quality-filtered and unassembled sequence data of over 12,000 metagenomes to assess ecosystem-specific microbial community composition and function. We found that tetranucleotide frequencies can differentiate communities across various natural environments, and that specific functional and metabolic trends can be observed in this structuring. Our tool places metagenomes sampled from diverse environments into clusters and along gradients of tetranucleotide frequency similarity, suggesting microbiome community compositions specific to gradient conditions. Within the resulting metagenome clusters, we identify protein-coding gene identifiers that are most differentiated between ecosystem classifications. We plan for annual updates to the metagenome stability diagram in IMG/M with new data, allowing for refinement of the ecosystem classifications delineated here. This framework has the potential to inform future studies on microbiome engineering, bioremediation, and the prediction of microbial community responses to environmental change. Methods Metagenome datasets, filtered sequencing reads following the standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (Huntemann et al. 2016) were analyzed for tetranucleotide counts with KMC (version 3.1.1) (Kokot, Długosz, and Deorowicz 2017). Tetranucleotide counts were converted into frequencies (e.g. 4-mer count divided by the total count of all 4-mers) and ratios of frequencies (e.g. 4-mer frequencies divided by all other permutations of 4-mer frequencies).
创建时间:
2025-01-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作