Supporting data for "PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes"

Name: Supporting data for "PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes"
Creator: GigaScience Database
Published: 2025-05-26 17:20:10
License: 暂无描述

DataCite Commons2025-05-26 更新2025-04-15 收录

下载链接：

http://gigadb.org/dataset/100715

下载链接

链接失效反馈

官方服务：

资源简介：

Environmental DNA (eDNA) and metabarcoding allow the identification of a mixture of species individuals and launch a new era in bio- and eco-assessment. A great number of steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tools execution parameters need to be tailored to reflect each experiments idiosyncrasy. Adding to this complexity, the computation capacity of High Performance Computing systems is frequently required for such analyses. To address the aforementioned difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise are programming languages specialized for big data pipelines, incorporating features like roll-back checkpoints and on-demand partial pipeline execution. <br>PEMA is a containerized assembly of key metabarcoding analysis tools with a low effort in setting up, running and customizing to researchers needs. Based on third party tools, PEMA performs read pre-processing, (M)OTUs clustering, ASV inference, and taxonomy assignment for 16S and 18S rRNA as well as ITS and COI marker gene data. Due to its simplified parameterisation and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved comparable quality results. <br> An HPC-based approach was used to develop PEMA, however it can be used in personal computers as well. Given its time-efficient performance and its quality results, it is suggested that PEMA can be used for accurate eDNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

提供机构：

GigaScience Database

创建时间：

2020-02-13