Datasets for "Reference library readiness for eDNA: A multi-marker gap analysis and prioritization roadmap of Philippine marine fishes"

Name: Datasets for "Reference library readiness for eDNA: A multi-marker gap analysis and prioritization roadmap of Philippine marine fishes"
Creator: Mendeley Data
Published: 2026-03-18 07:19:08
License: 暂无描述

DataCite Commons2026-03-18 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/7nrbpjjnng/1

下载链接

链接失效反馈

官方服务：

资源简介：

These datasets accompany the manuscript “Reference library readiness for eDNA: a multi-marker gap analysis and prioritization roadmap of Philippine marine fishes”. It contains the raw input files, repository data package, Python processing and retrieval scripts, intermediate harmonised tables, downstream R analysis scripts, and exported summary outputs required to reproduce and audit the taxonomic harmonisation, sequence-record retrieval, marker-coverage summaries, accession/record-depth analyses, family-level comparisons, provenance analyses, and manuscript figure generation presented in the study. The workflow begins with a curated species list used for WoRMS-based taxonomic harmonisation. That harmonisation step generates the standardised species backbone and related tracking files used to support downstream repository processing. The WoRMS-derived outputs, including the harmonised species table and repository retrieval name list, are then used as inputs for both the GenBank and BOLD workflows. The GenBank workflow uses the WoRMS-harmonised inputs to retrieve and screen repository records, producing a resolved record-level feature table and downstream summary products including per-species marker coverage, per-species accession depth, marker-depth summaries, marker-presence summaries, family-level coverage tables, marker-tier summaries, and diagnostic outputs. The BOLD workflow uses the same harmonised taxonomic inputs together with the archived public BOLD data package BOLD_Public.26-Sep-2025.tar.gz. This produces the initial BOLD record table and quality-control outputs. Because some target species were not represented in the public-package-derived record set, the workflow also incorporates a patching step based on a legacy minimal record file from a discontinued API retrieval workflow. The patched final record table is then used to generate resolved per-species marker coverage tables, per-species record-depth summaries, marker-depth summaries, marker-tier summaries, marker-presence summaries, family-level coverage tables, and diagnostic outputs. Downstream analyses are performed in R using the processed GenBank and BOLD outputs. These scripts generate marker-coverage summaries, “lack all four markers” summaries, accession/record-depth figures, family-level bar charts, family-level heatmaps, provenance summaries, provenance maps, and manuscript-ready figure exports. The R scripts therefore serve as both analysis and figure-generation scripts. A detailed inventory of all archived folders, files, and their roles in the workflow is provided in File_Index_and_Descriptions.pdf. The archive is organised to preserve the provenance of the manuscript from raw taxonomic input and repository-specific processing through to the intermediate tables, final analyses, and exported figures.

提供机构：

Mendeley Data

创建时间：

2026-03-18