Tetranucleotide frequencies and ratios of frequencies
收藏DataONE2025-04-16 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:de4fc11d95a700e2299296ca492ddc6819b82e4cacf40934bd92badfc8d13e2f
下载链接
链接失效反馈官方服务:
资源简介:
Microbiomes are constrained by physicochemical conditions, nutrient regimes, and community interactions across diverse environments, yet genomic signatures of this adaptation remain unclear. Metagenome sequencing is a powerful technique to analyze genomic content in the context of natural environments, establishing concepts of microbial ecological trends. Here, we developed a data discovery tool - a tetranucleotide-informed metagenome stability diagram - that is publicly available in the Integrated Microbial Genomes and Microbiomes (IMG/M) platform for metagenome-ecosystem analyses. We analyzed the tetranucleotide frequencies from quality-filtered and unassembled sequence data of over 12,000 metagenomes to assess ecosystem-specific microbial community composition and function. We found that tetranucleotide frequencies can differentiate communities across various natural environments, and that specific functional and metabolic trends can be observed in this structuring. Our tool places m..., Metagenome datasets, filtered sequencing reads following the standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (Huntemann et al. 2016) were analyzed for tetranucleotide counts with KMC (version 3.1.1) (Kokot, DÅugosz, and Deorowicz 2017). Tetranucleotide counts were converted into frequencies (e.g. 4-mer count divided by the total count of all 4-mers) and ratios of frequencies (e.g. 4-mer frequencies divided by all other permutations of 4-mer frequencies)., , # Data from: Tetranucleotide frequencies and ratios of frequencies
[https://doi.org/10.5061/dryad.tb2rbp0c8](https://doi.org/10.5061/dryad.tb2rbp0c8)
## Description of the data and file structure
Publicly available metagenomes sequenced at the JGI and added to the IMG database before April 10th 2024 (date of collection) were considered for this analysis [(Chen et al. 2023)](https://www.zotero.org/google-docs/?xk8WRD). This data collection criteria yielded 15,208 metagenome datasets labeled as âMetagenome Analysisâ as their GOLD Analysis Project Type [(Mukherjee et al. 2024)](https://www.zotero.org/google-docs/?vqFT1Z). Metagenomes that had a GOLD Ecosystem classification of âEngineeredâ or âHost-associatedâ or had a GOLD Ecosystem Type classification of âNestâ were removed from our metagenome collection for non-natural or host-restricted environment properties that would alter the interpretation of tetranucleotide frequencies plots as reflecting physicochemical pressures. After data ...,
微生物组在多样环境中受物理化学条件、营养模式与群落互作的约束,但此类适应性的基因组特征仍未明确。宏基因组(Metagenome)测序是分析自然环境背景下基因组内容的有力技术,可助力确立微生物生态趋势相关概念。本研究开发了一款数据发现工具——基于四核苷酸(tetranucleotide)信息的宏基因组稳定性图谱,该工具可在用于宏基因组-生态系统分析的整合微生物基因组与微生物组(IMG/M)平台中公开获取。我们对超过12000份宏基因组的质控过滤后未组装序列数据中的四核苷酸频率进行分析,以评估生态系统特异性的微生物群落组成与功能。研究发现,四核苷酸频率可区分不同自然环境中的群落,且在此类群落结构中可观察到特定的功能与代谢趋势。本工具可对……
遵循美国能源部联合基因组研究所(DOE-JGI)宏基因组注释流程(Huntemann等人,2016)的标准操作步骤进行过滤的测序读段所对应的宏基因组数据集,使用KMC(版本3.1.1)(Kokot、德武戈什与德奥罗维茨,2017)进行四核苷酸计数分析。四核苷酸计数将被转换为频率(例如,4聚体计数除以所有4聚体的总计数)以及频率比值(例如,4聚体频率除以其余所有4聚体频率的排列组合)。
# 数据来源:四核苷酸频率与频率比值
https://doi.org/10.5061/dryad.tb2rbp0c8
## 数据与文件结构说明
本分析纳入了2024年4月10日(数据收集日期)前于联合基因组研究所(JGI)测序并已录入IMG数据库的公开可用宏基因组样本[(Chen等人,2023)](https://www.zotero.org/google-docs/?xk8WRD)。按照该数据收集标准,共得到15208份被标记为"Metagenome Analysis"作为其基因组在线数据库(GOLD)分析项目类型的宏基因组数据集[(Mukherjee等人,2024)](https://www.zotero.org/google-docs/?vqFT1Z)。本研究将GOLD生态系统分类为"Engineered"或"Host-associated",或GOLD生态系统类型分类为"Nest"的宏基因组样本从数据集剔除,因为此类环境属于非自然或宿主限制性环境,会改变将四核苷酸频率图谱解读为反映物理化学压力的结论。在数据……
创建时间:
2025-04-17



