five

Tetranucleotide frequencies and ratios of frequencies

收藏
DataONE2025-01-03 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:0b4c23ba47e9716288c4d3c6f89cb3db2d6604b157d2dd17e0b99fc7ed7f05c0
下载链接
链接失效反馈
官方服务:
资源简介:
Microbiomes are constrained by physicochemical conditions, nutrient regimes, and community interactions across diverse environments, yet genomic signatures of this adaptation remain unclear. Metagenome sequencing is a powerful technique to analyze genomic content in the context of natural environments, establishing concepts of microbial ecological trends. Here, we developed a data discovery tool - a tetranucleotide-informed metagenome stability diagram - that is publicly available in the Integrated Microbial Genomes and Microbiomes (IMG/M) platform for metagenome-ecosystem analyses. We analyzed the tetranucleotide frequencies from quality-filtered and unassembled sequence data of over 12,000 metagenomes to assess ecosystem-specific microbial community composition and function. We found that tetranucleotide frequencies can differentiate communities across various natural environments, and that specific functional and metabolic trends can be observed in this structuring. Our tool places m..., Metagenome datasets, filtered sequencing reads following the standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (Huntemann et al. 2016) were analyzed for tetranucleotide counts with KMC (version 3.1.1) (Kokot, Długosz, and Deorowicz 2017). Tetranucleotide counts were converted into frequencies (e.g. 4-mer count divided by the total count of all 4-mers) and ratios of frequencies (e.g. 4-mer frequencies divided by all other permutations of 4-mer frequencies)., , # Data from: Tetranucleotide frequencies and ratios of frequencies [https://doi.org/10.5061/dryad.tb2rbp0c8](https://doi.org/10.5061/dryad.tb2rbp0c8) ## Description of the data and file structure Publicly available metagenomes sequenced at the JGI and added to the IMG database before April 10th 2024 (date of collection) were considered for this analysis [(Chen et al. 2023)](https://www.zotero.org/google-docs/?xk8WRD). This data collection criteria yielded 15,208 metagenome datasets labeled as “Metagenome Analysis” as their GOLD Analysis Project Type [(Mukherjee et al. 2024)](https://www.zotero.org/google-docs/?vqFT1Z). Metagenomes that had a GOLD Ecosystem classification of “Engineered” or “Host-associated” or had a GOLD Ecosystem Type classification of “Nest” were removed from our metagenome collection for non-natural or host-restricted environment properties that would alter the interpretation of tetranucleotide frequencies plots as reflecting physicochemical pressures. After data ...
创建时间:
2025-01-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作