Simulated dataset (Almeida et al., 2018 - GigaScience) treated with various clustering programs to evaluate ReClustOR efficiency and constitency
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3355910
下载链接
链接失效反馈官方服务:
资源简介:
ReClustOR is a novel clustering method that overcomes some of the problems associated with classical ‘heuristic’clustering methods and consequently increases the stability and quality of the reconstructed OTUs. Moreover, the OTUs database defined with ReClustOR can be used as reference(s) with gradual enrichment of it, with new studies and samples. In this way, huge datasets like the Earth Microbiome Project can be easily used as references for smaller projects, thereby increasing the quality of comparisons between studies and datasets
Here, we propose a new approach called ReClustOR (for RE-CLUSTering method using an Open-Reference approach) to improve OTU consistency (see https://doi.org/10.5281/zenodo.2597402). This new strategy combines two of the previously-described clustering methods. Firstly, a classical clustering method (e.g. SWARM, or VSEARCH) is used to define OTU centroids and create a reference database. Secondly, a closed- or open-reference method (depending on the user’s choice) is computed for all reads which are not considered as OTU centroids. Contrary to the classical clustering methods, each read is compared to all centroids using a distance-based greedy clustering technique (Edgar, 2010; He et al., 2015), and then assigned to the nearest one, thereby fixing the erroneous assignments of reads to OTUs.
To highlight the improvements provided by ReClustOR in describing microbial diversity in terms of ecological diversity metrics (e.g. richness, OTU composition, Shannon, 1/Simpson) and taxonomic composition, a simulated dataset was subjected to: (i) ESV definition, (ii) multiple conventional de novo methods (i.e. a homemade de novo clustering close to CRUNCHCLUST, VSEARCH and SWARM), and (iii) ReClustOR computation. This dataset is a simulated one (Almeida et al., 2018), containing a diverse set of genera commonly found in three ecosystems different ecosystems: human gut, ocean and soil. The clustering methods were compared for: (i) their ability to describe microbial richness, (ii) the congruence between OTU assignments and sequences taxonomy, (iii) the robustness of each defined OTU, and (iv) their ability to efficiently describe the microbial community based on OTU composition.
Here, the simulated dataset (00_Raw_data) and all steps of analysis are available to resue them to test ReClustOR, and also to have a better understanding of files and data produced by this program. More details are available in the Tree_of_data.tree file.
创建时间:
2020-01-24



