Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach"
收藏DataCite Commons2021-03-22 更新2024-07-13 收录
下载链接:
https://pub.uni-bielefeld.de/record/2918928
下载链接
链接失效反馈官方服务:
资源简介:
Data sets and results of the comparative analyses of [GeFaST](https://github.com/romueller/gefast) performed in "[GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2349-1)" (. The scripts for the analyses are available [here](https://github.com/romueller/gefast-paper-analysis).<br><br> **dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2], _ELDERMET_). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.<br> * _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2) * _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2) * _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158) The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).<br><br> **eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.<br><br> **uneven\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _uneven_, each containing 80 % of the sequences from _uneven\_derep.fasta_ (in _dereplicated.tar.bz2_).<br><br> **even\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _even_, each containing 80 % of the sequences from _even\_derep.fasta_ (in _dereplicated.tar.bz2_).<br><br> **eldermet\_reduced\_subsamples\_80.tar.bz2**: Archive containing the reduced _ELDERMET_ data set and five random subsamples of it, each containing 80 % of the sequences from _eldermet\_derep.reduced.fasta_, plus the corresponding taxonomic assignments. <br><br> **results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.<br> * _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds * _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes * _eldermet-sub-fixed-red-log.csv_: runtime and memory consumption for different thresholds (on subsamples of reduced data set) * _eldermet-sub-fixed-red-metrics.csv_: clustering quality for different thresholds (on subsamples of reduced data set) * _even\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _even\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _even\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _uneven\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _uneven\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) * _even-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) <br><br> References: [1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107) [2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
提供机构:
Bielefeld University
创建时间:
2018-04-06



