five

Benchmark data sets, software results and reference data for the first CAMI challenge.

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100344
下载链接
链接失效反馈
官方服务:
资源简介:
In just over a decade, metagenomics has developed into a powerful and productive method in microbiology and microbial ecology. The ability to retrieve and organize bits and pieces of genomic DNA from any natural context has opened a window into the vast universe of uncultivated microbes. Tremendous progress has been made in computational approaches to interpret this sequence data but none can completely recover the complex information encoded in metagenomes. A number of challenges stand in the way. Simplifying assumptions are needed and lead to strong limitations and potential inaccuracies in practice. Critically, methodological improvements are difficult to gauge due to the lack of a general standard for comparison. Developers also face a substantial burden to individually evaluate existing approaches, which consumes time and computational resources, and may introduce unintended biases.<br><br> The Critical Assessment of Metagenome Interpretation (CAMI) is a community-led initiative that tackles these problems by aiming for an independent, comprehensive and bias-free evaluation of methods. In the first CAMI challenge running from March to July 2015, it provided three simulated benchmark metagenome datasets of different organismal complexities and sizes. These were generated from around ~700 newly sequenced genomes and ~600 circular elements (plasmids, viruses, other circular elements) not included in public databases during the challenge. These are now available here, together with gold standards for assembly, genome and taxonomic binning and taxonomic profiling, the underlying genome sequences, NCBI and ARB reference sequences snapshots from before the challenge and the reference NCBI taxonomy used. In addition, 3 test (toy) data sets are provided that were simulated from public genomes before the challenge. For the most realistic evaluation of reference based methods on the challenge data sets, usually taxonomic binners and profilers, the provided reference sequences or other sequence collections from before challenge should be used as references, as by now all underlying genomes have been deposited at NCBI or EBI.
提供机构:
GigaScience Database
创建时间:
2017-08-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作