Supporting data for "Mantis: flexible and consensus-driven genome annotation"

Name: Supporting data for "Mantis: flexible and consensus-driven genome annotation"
Creator: GigaScience Database
Published: 2025-05-26 17:15:42
License: 暂无描述

DataCite Commons2025-05-26 更新2025-04-15 收录

下载链接：

http://gigadb.org/dataset/100903

下载链接

链接失效反馈

官方服务：

资源简介：

The past decades have seen a rapid development of the (meta-)omics fields, producing an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, some challenges remain, specifically in terms of speed, flexibility, and reproducibility. In the era of big data analysis, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources.<br> We implemented a protein annotation tool - Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes.

过去几十年间，（宏）组学领域发展迅速，产生了前所未有的大量高分辨率、高保真度数据。借助这些数据集，我们可推断单一生物及群落中此前功能未注释蛋白质的作用。在此背景下，蛋白质功能注释可被描述为识别蛋白质序列中的感兴趣区域（即结构域）并赋予其生物学功能的过程。尽管已有大量工具可用，但仍存在若干挑战，尤其在速度、灵活性及可重复性方面。在大数据分析时代，不再将研究结果局限于单一参考来源、整合不同数据源的知识以克服过度依赖单一来源计算生成数据的局限性，这一点也日益重要。我们开发了一款蛋白质注释工具——Mantis，它通过数据库标识符交集与文本挖掘技术，将来自多个参考数据源的知识整合为单一的共识驱动输出。Mantis具有灵活性，支持参考数据与执行参数的自定义，且在不同研究目标及用户环境下均可重复使用。我们为结构域特异性注释实现了深度优先搜索算法，与全序列注释相比，该算法显著提升了注释性能。Mantis的并行化实现不仅缩短了运行时间，还能输出高覆盖率、高质量的蛋白质功能注释。 Mantis是一款生成高质量共识驱动蛋白质注释的工具，易于安装、自定义及使用，可从单一基因组扩展至大型宏基因组分析场景。

提供机构：

GigaScience Database

创建时间：

2021-05-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集