Environmental DNA metabarcoding to monitor tropical reef fishes in Santa Marta

Name: Environmental DNA metabarcoding to monitor tropical reef fishes in Santa Marta
Creator: figshare
Published: 2021-06-11 17:16:20
License: 暂无描述

DataCite Commons2021-06-11 更新2024-07-28 收录

下载链接：

https://figshare.com/articles/dataset/Environmental_DNA_metabarcoding_to_monitor_tropical_reef_fishes_in_Santa_Marta/14771112

下载链接

链接失效反馈

官方服务：

资源简介：

Environmental DNA (eDNA) provides a revolutionary method to monitor species in marine ecosystems from animal DNA present in the water. Examining the capacity of eDNA to provide accurate biodiversity measures in species-rich ecosystems such as coral reefs is a prerequisite for their long-term monitoring. Here, we surveyed a Colombian tropical marine reefs, the Gayraca Bay near Santa Marta using eDNA method. We collected a large quantity of surface water (30 L per filter) above the reefs and applied a metabarcoding protocol using three different primer sets targeting the 12S mitochondrial DNA, specific to vertebrates, Actinopterygii and Elasmobranchii. The assignment of eDNA sequences to species using a public reference database allowed detecting the presence of 85 fish species, 92 genera and 57 families in Providencia. Filtering and taxonomic assignments Obitools clustering: Following the sequencing, reads were processed to remove errors and analyzed using programs implemented in the OBITools package (http://metabarcoding.org/obitools; Boyer et al., 2016) following a previous protocol (Valentini et al., 2016). We assembled the forward and reverse reads using the ILLUMINAPAIREDEND program using a minimum score of 40 and retrieving only joined sequences. Then, we assigned the reads to each sample using NGSFILTER software. A separate data set was created for each sample by splitting the original data set into several files using OBISPLIT. After this step, we analyzed each data set sample individually before merging the taxon list for the final ecological analysis. Strictly identical sequences were clustered together using OBIUNIQ. We excluded sequences shorter than 20 bp or with fewer than 10 reads using the OBIGREP program and ran the OBICLEAN program within a PCR product. All sequences labeled ‘internal’ that most likely corresponded to PCR substitutions and indel errors were discarded. We realized the taxonomic assignment of the remaining sequences using the program ECOTAG using the NCBI reference database (www.ncbi.nlm.nih.gov, release 233, downloaded on 11 Oct. 2019). We corrected taxonomic assignment outputs to avoid any over-confidence in assignments: species-level assignments were validated only for sequences with an identification match >98%, genus-level for a 96-98% match and family-level for an 90-96% match. Considering the wrong assignment of a few sequences to the sample due to tag-jumps (Schnell et al., 2015), we discarded all sequences with a frequency of occurrence < 0.001 per sequences and per library. For example, if a sequence has a total read count of 100,000 in the library, all detections of this sequence below 100 reads (100’000 * 0.001 = 100) in a tag combination are discarded. We further corrected for Index-Hopping (MacConaill et al., 2018) with a threshold empirically determined per sequencing batch using experimental blanks (i.e. combinations of tags not present in the libraries), for a given sequencing batch between libraries. This index is defined to remove all reads present in plates where the combination of tags were not present in the library, and is later applied for each plate position. For example, with our selected threshold of 0.001, if a sequence has a total read count of 10,000 at the P1_A1 plate position of the library A, all detections of this sequence below 10 reads (10’000 * 0.001 = 10) are discarded at the plate position P1_A1 for the library B if library A and B belong to the same sequencing batch.Swarm clustering: We applied a second bioinformatics workflow, the clustering algorithm SWARM, which uses sequence similarity and abundance patterns to cluster multiple variants of sequences into MOTU (Molecular Operational Taxonomic Units; Mahé et al., 2014; Rognes et al., 2016). While the OBITools bioinformatics pipeline allows optimizing the taxonomic identification of sequence, even or rare ones, the SWARM approach allows clustering similar sequence and provides full compositional matrices even in the absence of a complete reference database (Marques et al., 2020). First, we merged sequences using vsearch software to remove sequences containing ambiguities (Rognes et al., 2016). We then applied CUTADAPT software (Martin, 2013) for demultiplexing and primer trimming (Table TS2). Next, we ran SWARM with a minimum distance of one mismatch to make clusters. Once the MOTUs were generated, we used the most abundant sequence within each cluster as a representative sequence for taxonomic assignment. Then, we applied a post-clustering curation algorithm (LULU; Frøslev et al., 2017) to curate the data. We validated the outputs using the same thresholds as for the OBITools one. Further quality cleaning was identical to that used in the OBITools pipeline (identify minimum number of reads, remove non-target taxa, apply tag-jump cleaning), with the addition of a single step removing all MOTUs present in only PCR within the entire data set. This additional step was necessary because PCR errors are unlikely to be present in more than one PCR occurrence, and it removes spurious MOTUs that would otherwise inflate diversity estimates (see Marques et al., 2020). For the teleo marker, this approach has been validated with fish observation data, where MOTUs generally correspond to species (Marques et al., 2020), but estimates were so far not validated for other markers.

提供机构：

figshare

创建时间：

2021-06-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集