Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach" (draft version)
收藏DataCite Commons2020-09-19 更新2024-07-13 收录
下载链接:
https://pub.uni-bielefeld.de/data/2915656
下载链接
链接失效反馈官方服务:
资源简介:
**NOTE: This data publication refers to an outdated version of the article. See [10.4119/unibi/2918928](http://doi.org/10.4119/unibi/2918928) for the data repository of the final manuscript.**<br><br>
Data sets and results of the comparative analyses of [GeFaST](https://github.com/romueller/gefast) and [Swarm](https://github.com/torognes/swarm/) performed in "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach" (submitted). The scripts for the analyses are available [here](https://github.com/romueller/gefast-paper-analysis).<br><br>
**dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2]). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.<br>
* _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2)
* _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2)
* _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158)
The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).<br><br>
**eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.<br><br>
**results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.<br>
* _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds
* _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes
* _even-metrics.csv_: clustering quality for different thresholds.
* _uneven-metrics.csv_: clustering quality for different thresholds <br><br>
References:
[1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly.
Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107)
[2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
提供机构:
Bielefeld University
创建时间:
2017-12-11



