Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJEB41812
下载链接
链接失效反馈官方服务:
资源简介:
Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that is deposited in online repositories. Metagenomic and metatranscriptomic datasets are typically analysed with regards to a specific biological question. However, it is widely acknowledged that these datasets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called ‘dark matter’ is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to forty distinct studies, comprising 963 samples, and covering ten different human biomes including fecal, oral, lung, skin and circulatory system microbiomes. The framework was used to determine the proportion of taxonomically unknown sequences present within samples, and to compare such sequences both within and across assembled metagenomes. We found that whilst the human microbiome is one of the most extensively studied, on average 2\% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25\% for less well studied skin and oral biomes that have more interactions with the environment. The publicly available datasets used have not previously been systematically mined to quantify and compare such dark matter. Typically, these unknown sequences are found in several microbiomes and potentially belong to unidentified novel microbes that we interact with on a daily basis. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing.
创建时间:
2022-02-28



