Evaluation data for "On the use of sequence-quality information in OTU clustering"

Name: Evaluation data for "On the use of sequence-quality information in OTU clustering"
Creator: Bielefeld University
Published: 2021-08-18 15:39:31
License: 暂无描述

DataCite Commons2021-08-18 更新2024-07-13 收录

下载链接：

https://pub.uni-bielefeld.de/record/2951742

下载链接

链接失效反馈

官方服务：

资源简介：

Prepared data sets and aggregated results of the comparative evaluation of [GeFaST](https://github.com/romueller/gefast)'s quality-aware clustering and refinement methods performed in "On the use of sequence-quality information in OTU clustering" (submitted). GeFaST is compared to DADA2, USEARCH, VSEARCH, UPARSE and Swarm on two collections of data sets described in [*DADA2: High-resolution sample inference from Illumina amplicon data*](https://doi.org/10.1038/nmeth.3869) (Callahan et al.) and [*Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering*](https://doi.org/10.1186/s40168-015-0105-6) (Franzén et al.). The provided files allow to repeat the evaluation by rerunning the tools or just reanalysing the results. The evaluation repository is available [here](https://github.com/romueller/gefast-qa-evaluation). *Input files:* **callahan_data.tar.bz2**: Read files and ground truths of the Callahan data sets used by GeFaST, USEARCH, VSEARCH, UPARSE and Swarm. This archive should be extracted in the `analyses/model_supported_callahan/`, `analyses/quality_weighted_callahan/`, `analyses/swarm_callahan/`, `analyses/uvsearch_callahan/` and `analyses/performance/` subfolder of the evaluation repository. **dada2_callahan_data.tar.bz2**: Read files and ground truths of the Callahan data sets used by DADA2. This archive should be extracted in the `analyses/dada2_callahan/` subfolder of the evaluation repository. The workflow of DADA2 differs from the other examined tools and, thus, involves slightly different ground-truth files in order to assess the quality of the reconstructed clusters. **franzen_data.tar.bz2**: Read files and ground truths of the Franzén data sets used by all tools. This archive should be extracted in the `analyses/dada2_franzen/`, `analyses/model_supported_franzen/`, `analyses/quality_weighted_franzen/`, `analyses/swarm_franzen/` and `analyses/uvsearch_franzen/` subfolder of the evaluation repository. Since the origin of *in silico* sequenced amplicons is known, the ground truths can also be used for DADA2. *Aggregated results:* **dada2_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of DADA2 on the Callahan data sets. This archive should be extracted in the `analyses/dada2_callahan/` subfolder of the evaluation repository. **dada2_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of DADA2 on the Franzén data sets. This archive should be extracted in the `analyses/dada2_franzen/` subfolder of the evaluation repository. **model_supported_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of the model-supported clustering and refinement methods of GeFaST on the Callahan data sets. This archive should be extracted in the `analyses/model_supported_callahan/` subfolder of the evaluation repository. **model_supported_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of the model-supported clustering and refinement methods of GeFaST on the Franzén data sets. This archive should be extracted in the `analyses/model_supported_franzen/` subfolder of the evaluation repository. **performance_evaluation.tar.bz2**: Aggregated results (clustering quality, runtime, memory consumption) of the different runs of GeFaST, USEARCH, VSEARCH, UPARSE and Swarm on the largest Callahan data set (hmp_single). This archive should be extracted in the `analyses/performance/` subfolder of the evaluation repository. **quality_weighted_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of GeFaST involving quality-weighted alignments on the Callahan data sets. This archive should be extracted in the `analyses/quality_weighted_callahan/` subfolder of the evaluation repository. **quality_weighted_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of GeFaST involving quality-weighted alignments on the Franzén data sets. This archive should be extracted in the `analyses/quality_weighted_franzen/` subfolder of the evaluation repository. **swarm_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of Swarm on the Callahan data sets. This archive should be extracted in the `analyses/swarm_callahan/` subfolder of the evaluation repository. **swarm_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of Swarm on the Franzén data sets. This archive should be extracted in the `analyses/swarm_franzen/` subfolder of the evaluation repository. **uvsearch_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of USEARCH, VSEARCH and UPARSE on the Callahan data sets. This archive should be extracted in the `analyses/uvsearch_callahan/` subfolder of the evaluation repository. **uvsearch_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of USEARCH, VSEARCH and UPARSE on the Franzén data sets. This archive should be extracted in the `analyses/uvsearch_franzen/` subfolder of the evaluation repository.

提供机构：

Bielefeld University

创建时间：

2021-08-18