Datasets for "Reference library readiness for eDNA: A multi-marker gap analysis and prioritization roadmap of Philippine marine fishes"
收藏DataCite Commons2026-03-18 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/7nrbpjjnng/1
下载链接
链接失效反馈官方服务:
资源简介:
These datasets accompany the manuscript “Reference library readiness for eDNA: a multi-marker gap analysis and prioritization roadmap of Philippine marine fishes”. It contains the raw input files, repository data package, Python processing and retrieval scripts, intermediate harmonised tables, downstream R analysis scripts, and exported summary outputs required to reproduce and audit the taxonomic harmonisation, sequence-record retrieval, marker-coverage summaries, accession/record-depth analyses, family-level comparisons, provenance analyses, and manuscript figure generation presented in the study.
The workflow begins with a curated species list used for WoRMS-based taxonomic harmonisation. That harmonisation step generates the standardised species backbone and related tracking files used to support downstream repository processing. The WoRMS-derived outputs, including the harmonised species table and repository retrieval name list, are then used as inputs for both the GenBank and BOLD workflows.
The GenBank workflow uses the WoRMS-harmonised inputs to retrieve and screen repository records, producing a resolved record-level feature table and downstream summary products including per-species marker coverage, per-species accession depth, marker-depth summaries, marker-presence summaries, family-level coverage tables, marker-tier summaries, and diagnostic outputs.
The BOLD workflow uses the same harmonised taxonomic inputs together with the archived public BOLD data package BOLD_Public.26-Sep-2025.tar.gz. This produces the initial BOLD record table and quality-control outputs. Because some target species were not represented in the public-package-derived record set, the workflow also incorporates a patching step based on a legacy minimal record file from a discontinued API retrieval workflow. The patched final record table is then used to generate resolved per-species marker coverage tables, per-species record-depth summaries, marker-depth summaries, marker-tier summaries, marker-presence summaries, family-level coverage tables, and diagnostic outputs.
Downstream analyses are performed in R using the processed GenBank and BOLD outputs. These scripts generate marker-coverage summaries, “lack all four markers” summaries, accession/record-depth figures, family-level bar charts, family-level heatmaps, provenance summaries, provenance maps, and manuscript-ready figure exports. The R scripts therefore serve as both analysis and figure-generation scripts.
A detailed inventory of all archived folders, files, and their roles in the workflow is provided in File_Index_and_Descriptions.pdf. The archive is organised to preserve the provenance of the manuscript from raw taxonomic input and repository-specific processing through to the intermediate tables, final analyses, and exported figures.
提供机构:
Mendeley Data
创建时间:
2026-03-18



