Sources of uncertainty in DNA metabarcoding of whole communities: implications for its use in biomonitoring
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.dncjsxm9t
下载链接
链接失效反馈官方服务:
资源简介:
These data were derived from a structured experiment to assess factors influencing the precision and accuracy of molecular methods for freshwater macroinvertebrate community assessment. Benthic macroinvertebrates were sorted, identified, and counted from nine individual kick-net samples using standard morphometric protocols, then reconstituted. These bulk specimen samples were then homogenised. From each bulk sample, aliquots of the homogenate were distributed among seven laboratories across Europe for DNA extraction, PCR amplification, library preparation, and sequencing. Additionally, each laboratory was provided with a DNA extract from the bulk homogenate to allow for amplification through to sequencing, in order to assess the influence of DNA extraction. The data consist of abundances of individual taxa identified using morphometric protocols and read counts of operational taxonomic units identified by the molecular methods. In addition, we provide a taxonomically harmonised matrix of the community composition derived from morphometric and molecular approaches to facilitate comparison between the two methods.
Methods
Sample collection and processing
Field sampling
Nine samples of the benthic macroinvertebrate community were collected from seven river sites in southern England, UK (see Jones et al., 2025). Each had been the subject of at least three years study prior to the current sample collection, providing a comprehensive record of taxa likely to be present. Samples were collected using the UK standard macroinvertebrate biomonitoring protocol, comprising a 3-minute kick sample and 1-minute search with a 1 mm mesh size net sampling all habitats in proportion to their areal occurrence (BS EN 16150:2012). All material retained in the net was preserved on-site in 96% ethanol and returned to the laboratory.
Morphological identification
Macroinvertebrates were manually sorted from other material, identified to the lowest practicable taxonomic level (most to species or genus level, see Jones et al., 2025) and counted. Samples were processed by qualified, experienced freshwater biologists. All material and any picked animals were then reconstituted in the original 96% ethanol and stored at 4 oC. To ensure data obtained through morphological analysis did not influence molecular workflows or outputs, the nine samples were anonymised (A to I at random).
Molecular Analysis
Molecular analysis was carried out independently at seven laboratories across Europe, which were assigned a code (2–8) at random to preserve anonymity and eliminate bias.
At the Natural History Museum (NHM) in London, the nine samples were homogenised following Pereira-da-Conceicoa et al. (2021), with minor modifications (see Jones et al., 2025). Sub-samples of the homogenate were sent to the seven participating laboratories, where each performed their own DNA extraction. The NHM also extracted DNA from each of the homogenised samples.
Library preparation and sequencing
Each laboratory processed and prepared their library within their own institution following the two-step PCR approach as outlined by Buchner et al. (2021) using the BF3/BR2 freshwater invertebrate primer set developed for the cytochrome c oxidase subunit I (COI) gene (Elbrecht et al. 2019). Minor modifications to the protocol were made by individual laboratories due to facility constraints (see Jones et al., 2025), however, key steps were kept consistent: DNA extraction kit and primers used, PCR profiles and clean up steps. Two-step PCRs were performed on each extraction replicate using the Qiagen Multiplex PCR Plus Kit (Qaigen, Germany) with 0.2 μM of each primer in a final volume of 25 µl. PCRs were run with the following conditions: 95 °C for five minutes followed by 30 cycles of 95 °C for 30 seconds; 50 °C for 30 seconds; 72 °C for 30 seconds with a final extension at 68 °C for 10 minutes. Three PCR replicates were carried out from each sub-sample of NHM/Lab extract. In PCR 2, 1 μl of amplicon from PCR 1 was used in the following PCR conditions: 95 °C for five minutes, followed by 20 cycles of 95 °C for 30 seconds; 61 °C for 30 seconds; 72 °C for 42 seconds and a final extension at 68 °C for 10 minutes. All positive and negative controls were processed alongside samples. PCR products were then cleaned using the recommended Agencourt AMPure beads at a 0.7x ratio. The PCR product size was then checked on an agarose gel. The concentration of each of these samples was quantified using Qubit, Tapestation or qPCR (dependant on laboratory facilities) and samples were then pooled in equimolar concentrations with the negative controls added at the maximum volume for any single sample. Libraries were loaded at 12 pM concentration, with 5% PhiX control. Samples were run on an Illumina MiSeq (MiSeq Reagent Kit v3, 600 cycles) following the manufacturer's run protocols for 300 bp PE sequencing (Illumina, Inc. San Diego, CA, USA). Raw read data were compiled with participating laboratories anonymised, and then processed with the APSCALE pipeline (Buchner et al., 2022: v 1.6.3, https://github.com/DominikBuchner/apscale) using default settings. Taxonomic assignment was performed using BOLDigger (Buchner & Leese, 2020) (v 2.1.1). The best hit was determined with the BOLDigger method and then further corrected using the API correction function (see Jones et al., 2025).
Data harmonisation and index calculation
Data harmonisation
Output from the bioinformatics pipeline comprised 7680 operational taxonomic units (OTUs) identified from the 504 PCR replicates (six replicate PCRs for nine samples, as well as nine negative controls and nine positive controls processed by each of the seven laboratories). Records of non-target taxa were removed (see Jones et al., 2025), where target taxa were defined as the freshwater macroinvertebrates considered in the mixed taxonomic level (MTL) system of the Environment Agency (2014). The resultant matrix consisted of 519 OTUs detected across the 504 PCR replicates.
To facilitate comparison between morphologically derived and metabarcoding data, both datasets were harmonised to the same operational MTL (see Jones et al., 2025), ensuring that it only contained discrete taxa (see Jones et al., 2025). The final harmonised list comprised 162 discrete taxa.
创建时间:
2025-06-12



