2014 Env Microbiol, Supplementary Figures (S1-S7)
收藏Figshare2016-01-19 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/2014_Env_Microbiol_Supplementary_Figures_S1_S7_/1219368/1
下载链接
链接失效反馈官方服务:
资源简介:
# Figure_S1.pdf<br>Adjusted Mutual Information (AMI) between methods across thresholds when clustering the HSM dataset. Equivalent to Figure 4 in the main text. Raw AMI values provided in Table S4. # Figure_S2.pdf<br>Normalized Mutual Information (NMI) between methods across thresholds when clustering the HSM dataset. Equivalent to Figure 4 in the main text. Raw NMI values provided in Table S5. # Figure_S3.pdf<br>Adjusted Rand Index (ARI) between methods across thresholds when clustering the HSM dataset. Equivalent to Figure 4 in the main text. Raw ARI values provided in Table S6. # Figure_S4.pdf<br>Normalized Mutual Information (NMI) between methods across thresholds when clustering the global dataset of 887,870 16S sequences. Equivalent to Figure 4 in the main text. Raw NMI values provided in Table S8. # Figure_S5.pdf<br>Adjusted Rand Index (ARI) between methods across thresholds when clustering the global dataset of 887,870 16S sequences. Equivalent to Figure 4 in the main text. Raw ARI values provided in Table S9. # Figure_S6.pdf<br>Pairwise similarities between clustering methods, expressed as absolute differences in partition similarities to other methods. For every pair of clustering methods, differences in partition similarities (expressed as AMI) to all methods across thresholds are shown as histograms. For example, the top left subgraph shows differences between AL and all other methods; it shows that CD-HIT and AL provide very similar AMI values against other methods across thresholds, although CD-HIT AMI values tend to be slightly lower (peak shifted to the left). In other words, the subplots indicate how similarly pairs of methods behave, using partition similarities to other methods across thresholds as reference. #Figure_S7.pdf<br>Differential filtering for ‘chimeric’ sequences by UCHIME and UPARSE. The 16S rRNA gene sequence dataset used in this study was filtered for chimeric sequences using two different protocols, based on UCHIME and UPARSE (see Methods and Text S1). For the UCHIME workflow, filtering was performed in 'uchime_ref' mode, using a custom reference database of non-chimeric sequences, generated directly from the full set of unfiltered 16S rRNA sequences (see Text S1). This custom reference database was tailored to the present sequence dataset, thus allowing for more stringent chimera checking than general-purpose databases such as the frequently used GOLD database (Pagani et al., 2012; http://www.genomesonline.org). UPARSE implements similarity threshold-dependent on-the-fly chimera filtering at the time of clustering, followed by UCHIME filtering of cluster seed sequences (see Text S1 for detailed parameters). We observed that for all tested thresholds, UPARSE-filtered sets were entirely contained within the UCHIME-filtered sequence set (‘UPARSE ⊆ UCHIME’). (A) The global, unfiltered dataset contained 6,760 16S rRNA gene sequences from fully sequenced genomes deposited in the curated NCBI RefSeq database. Out of these, UCHIME retained 97.4% as ‘non-chimeric’ when filtering against our custom reference database and 99.6% when filtering against the GOLD database. In contrast, UPARSE retained 53.5-76% of reads, depending on the clustering threshold. (B) For the entire, global dataset, UCHIME retained 80.3% ‘non-chimeric’ sequences when filtering against the custom database and 97.2% against the GOLD database, while UPARSE retained 48.2-70.9%, depending on the threshold.
创建时间:
2014-10-28



