five

Evaluation data for "On the use of sequence-quality information in OTU clustering"

收藏
DataCite Commons2021-08-18 更新2024-07-13 收录
下载链接:
https://pub.uni-bielefeld.de/record/2951742
下载链接
链接失效反馈
官方服务:
资源简介:
Prepared data sets and aggregated results of the comparative evaluation of [GeFaST](https://github.com/romueller/gefast)'s quality-aware clustering and refinement methods performed in "On the use of sequence-quality information in OTU clustering" (submitted). <br><br> GeFaST is compared to DADA2, USEARCH, VSEARCH, UPARSE and Swarm on two collections of data sets described in [*DADA2: High-resolution sample inference from Illumina amplicon data*](https://doi.org/10.1038/nmeth.3869) (Callahan et al.) and [*Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering*](https://doi.org/10.1186/s40168-015-0105-6) (Franzén et al.).<br><br> The provided files allow to repeat the evaluation by rerunning the tools or just reanalysing the results. The evaluation repository is available [here](https://github.com/romueller/gefast-qa-evaluation).<br><br> *Input files:* <br><br> **callahan_data.tar.bz2**: Read files and ground truths of the Callahan data sets used by GeFaST, USEARCH, VSEARCH, UPARSE and Swarm. This archive should be extracted in the `analyses/model_supported_callahan/`, `analyses/quality_weighted_callahan/`, `analyses/swarm_callahan/`, `analyses/uvsearch_callahan/` and `analyses/performance/` subfolder of the evaluation repository. **dada2_callahan_data.tar.bz2**: Read files and ground truths of the Callahan data sets used by DADA2. This archive should be extracted in the `analyses/dada2_callahan/` subfolder of the evaluation repository. The workflow of DADA2 differs from the other examined tools and, thus, involves slightly different ground-truth files in order to assess the quality of the reconstructed clusters. **franzen_data.tar.bz2**: Read files and ground truths of the Franzén data sets used by all tools. This archive should be extracted in the `analyses/dada2_franzen/`, `analyses/model_supported_franzen/`, `analyses/quality_weighted_franzen/`, `analyses/swarm_franzen/` and `analyses/uvsearch_franzen/` subfolder of the evaluation repository. Since the origin of *in silico* sequenced amplicons is known, the ground truths can also be used for DADA2. <br><br> *Aggregated results:* <br><br> **dada2_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of DADA2 on the Callahan data sets. This archive should be extracted in the `analyses/dada2_callahan/` subfolder of the evaluation repository. **dada2_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of DADA2 on the Franzén data sets. This archive should be extracted in the `analyses/dada2_franzen/` subfolder of the evaluation repository. **model_supported_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of the model-supported clustering and refinement methods of GeFaST on the Callahan data sets. This archive should be extracted in the `analyses/model_supported_callahan/` subfolder of the evaluation repository. **model_supported_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of the model-supported clustering and refinement methods of GeFaST on the Franzén data sets. This archive should be extracted in the `analyses/model_supported_franzen/` subfolder of the evaluation repository. **performance_evaluation.tar.bz2**: Aggregated results (clustering quality, runtime, memory consumption) of the different runs of GeFaST, USEARCH, VSEARCH, UPARSE and Swarm on the largest Callahan data set (hmp_single). This archive should be extracted in the `analyses/performance/` subfolder of the evaluation repository. **quality_weighted_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of GeFaST involving quality-weighted alignments on the Callahan data sets. This archive should be extracted in the `analyses/quality_weighted_callahan/` subfolder of the evaluation repository. **quality_weighted_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of GeFaST involving quality-weighted alignments on the Franzén data sets. This archive should be extracted in the `analyses/quality_weighted_franzen/` subfolder of the evaluation repository. **swarm_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of Swarm on the Callahan data sets. This archive should be extracted in the `analyses/swarm_callahan/` subfolder of the evaluation repository. **swarm_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of Swarm on the Franzén data sets. This archive should be extracted in the `analyses/swarm_franzen/` subfolder of the evaluation repository. **uvsearch_callahan_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of USEARCH, VSEARCH and UPARSE on the Callahan data sets. This archive should be extracted in the `analyses/uvsearch_callahan/` subfolder of the evaluation repository. **uvsearch_franzen_evaluation.tar.bz2**: Aggregated results (clustering quality, number of clusters) of the different runs of USEARCH, VSEARCH and UPARSE on the Franzén data sets. This archive should be extracted in the `analyses/uvsearch_franzen/` subfolder of the evaluation repository.
提供机构:
Bielefeld University
创建时间:
2021-08-18
二维码
社区交流群
二维码
科研交流群
商业服务