World Register of Marine Species (WoRMS) Repackaged hash://sha256/4e969a1c8243b523b093d3a05fd5f7683479c2919e7d83e8b1383c5e5ef1d4e5 hash://md5/fb7559ce707d11f96a878d8a8a79a661
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12112609
下载链接
链接失效反馈官方服务:
资源简介:
Introduction
The World Register of Marine Species (WoRMS) (Ahyong et al. 2024) aims to “[…] provide an authoritative and comprehensive list of names of marine organisms, including information on synonymy. While the highest priority goes to valid names, other names in use are included so that this register can serve as a guide to interpret taxonomic literature. […]”
This data publication contains a verifiable copy of the World Register of Marine Species as well as a streamable version in json-line format. The aims of this publication are to:
provide a signed citation (Elliott, Poelen, and Fortes 2023) for a copy of WoRMS
prepare WoRMS to be included in the Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2023).
pre-process the World Register of Marine Species to facilitate optimized indexing and offline taxonomic name alignments using tools like Nomer (J. Poelen and Salim 2023).
Overall, the publications aims to facilitate taxonomic name alignment using the wealth of information provided by the World Register of Marine Species (WoRMS) to help enable fast, reproduceable, offline-enabled alignment of namelists with taxonomic resources of known provenance (or origin).
An example of an application facilitated by this publication is the Taxonomic Name Alignment tool as provided through https://github.com/globalbioticinteractions/name-alignment-template. This template repository implements an automated workflow using GitHub Action to align scientific names in csv/tsv files and darwin core archive with common taxonomic name lists like WoRMS, NCBI Taxonomy, Integrated Taxonomic Information System (ITIS), and GBIF Backbone taxonomy.
Methods
To capture and process WoRMS, the following steps were taken:
re-use a tracked and archived copy of WoRMS
make WoRMS archive streamable by translating the WoRMS DwC-A into a line-json
assign an alias to the processed resources
Steps 1-3 are captured and documented using Preston, a biodiversity data tracker. Preston not only helps to documents the steps, but also includes the digital resources that were used and produced.
Reuse Versioned WoRMS Archive
To reuse a versioned copy of June 2024 WoRMS as provided through Checklist Bank, the following command was issued:
# define dependency on a June 2024 version of Checklist Bank.
preston use hash://sha256/763edde4043c32ff53b9d8fc945a67b6409adccc0757a3bff1e52cccd4802476
# resolve WoRMS by their Checklist Bank Alias
preston cat hash://sha256/763edde4043c32ff53b9d8fc945a67b6409adccc0757a3bff1e52cccd4802476\
--remote https://linker.bio\
| grep hasVersion\
| grep "https://api.checklistbank.org/dataset/2011/archive.zip"\
| head -1\
| preston dwc-stream --remote https://linker.bio\
| gzip\
| preston track
# define an alias short cut to the generated json.gz archive
preston head\
| preston cat\
| grep hasVersion\
| grep -oE "hash://[a-z0-9]+/[a-f0-9]+"\
| xargs -L1 preston alias worms:worms.json.gz
With this, a previously archived copy of https://api.checklistbank.org/dataset/2011/archive.zip is downloaded and their sha256 checksum (or hash) is calculated. Also, the download process was previously captured machine-readable rdf/nquads statement of hash://sha256/763edde4043c32ff53b9d8fc945a67b6409adccc0757a3bff1e52cccd4802476.
An example record from the derived dataset worms:worms.json.gz included in this publication can be found using:
preston alias\
--anchor hash://sha256/4e969a1c8243b523b093d3a05fd5f7683479c2919e7d83e8b1383c5e5ef1d4e5\
--remote https://zenodo.org/deposit/8327611/files\
worms:worms.json.gz\
| preston cat\
| gunzip\
| head -n1\
| jq .
and is
{
"http://www.w3.org/ns/prov#wasDerivedFrom": "line:zip:hash://sha256/ca4baa1a90f4ce80d27ec7d1525f54a75fa679d76a66ed6143b17031287dea09!/taxon.txt!/L2",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://rs.tdwg.org/dwc/terms/Taxon",
"http://rs.tdwg.org/dwc/text/id": "urn:lsid:marinespecies.org:taxname:1",
"http://rs.tdwg.org/dwc/terms/scientificNameAuthorship": null,
"http://rs.tdwg.org/dwc/terms/nomenclaturalStatus": null,
"http://rs.tdwg.org/dwc/terms/acceptedNameUsage": "Biota",
"http://rs.tdwg.org/dwc/terms/infraspecificEpithet": null,
"http://rs.tdwg.org/dwc/terms/taxonRank": null,
"http://rs.tdwg.org/dwc/terms/phylum": null,
"http://rs.tdwg.org/dwc/terms/scientificNameID": "urn:lsid:marinespecies.org:taxname:1",
"http://rs.tdwg.org/dwc/terms/scientificName": "Biota",
"http://rs.tdwg.org/dwc/terms/parentNameUsage": "Biota",
"http://rs.tdwg.org/dwc/terms/datasetName": "World Register of Marine Species (WoRMS)",
"http://purl.org/dc/terms/references": "https://www.marinespecies.org/aphia.php?p=taxdetails&id=1",
"http://rs.tdwg.org/dwc/terms/subgenus": null,
"http://purl.org/dc/terms/rightsHolder": "WoRMS Editorial Board",
"http://rs.tdwg.org/dwc/terms/family": null,
"http://rs.tdwg.org/dwc/terms/order": null,
"http://rs.tdwg.org/dwc/terms/class": null,
"http://purl.org/dc/terms/bibliographicCitation": "WoRMS (2024). Biota. Accessed at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=1",
"http://rs.tdwg.org/dwc/terms/genus": null,
"http://purl.org/dc/terms/license": "http://creativecommons.org/licenses/by/4.0/",
"http://rs.tdwg.org/dwc/terms/namePublishedInYear": null,
"http://rs.tdwg.org/dwc/terms/institutionCode": "VLIZ",
"http://rs.tdwg.org/dwc/terms/acceptedNameUsageID": "urn:lsid:marinespecies.org:taxname:1",
"http://rs.tdwg.org/dwc/terms/kingdom": null,
"http://rs.tdwg.org/dwc/terms/taxonID": "urn:lsid:marinespecies.org:taxname:1",
"http://rs.tdwg.org/dwc/terms/parentNameUsageID": null,
"http://rs.tdwg.org/dwc/terms/datasetID": "https://doi.org/10.14284/170",
"http://rs.tdwg.org/dwc/terms/nomenclaturalCode": null,
"http://rs.tdwg.org/dwc/terms/namePublishedInID": null,
"http://purl.org/dc/terms/modified": "2004-12-21",
"http://rs.tdwg.org/dwc/terms/taxonomicStatus": "accepted",
"http://rs.tdwg.org/dwc/terms/namePublishedIn": null,
"http://rs.tdwg.org/dwc/terms/specificEpithet": null
}
which shows the first json-line object retrieved from the WoRMS DwC-A.
Results
As described in our methods, this publication derived the resource with alias worms:worms.json.gz and content id hash://sha256/0a6625167b6943d31cfa737dd5f6f123c767d236194f6a1c97c44f678d9fcaa7 . This resource contains a streamable line-json archive derived from a versioned copy package retrieved from https://api.checklistbank.org/dataset/2011/archive.zip with content identifier hash://sha256/ca4baa1a90f4ce80d27ec7d1525f54a75fa679d76a66ed6143b17031287dea09.
The following tools were used to process the WoRMS resource:
Tools used in this data publication
tool name
preston
bash
gzip
sed
head
Discussion
This publication is intended to facilitate re-use of the WoRMS data package in taxonomic name alignment workflows. While the primary goal was to generate a resource for use in Nomer v0.4.9 (J. Poelen and Salim 2023), other usage can be imagined such as:
Lots of Copies Keeps Stuff Safe (LOCKSS (Maniatis et al. 2005)): keep an identical copy of WoRMS data package outside of the Checklist Bank/ WoRMS infrastructure.
making a json-line streamable copy of WoRMS available via https://zenodo.org/record/12112610/files/0a6625167b6943d31cfa737dd5f6f123c767d236194f6a1c97c44f678d9fcaa7 for use in workflows like looking up the first record that contain Enhydra lutris (Sea otter):
curl -L 'https://zenodo.org/record/12112610/files/0a6625167b6943d31cfa737dd5f6f123c767d236194f6a1c97c44f678d9fcaa7'\
| gunzip\
| grep "Enhydra lutris"\
| head -n1\
| jq .
producing
{
"http://www.w3.org/ns/prov#wasDerivedFrom": "line:zip:hash://sha256/ca4baa1a90f4ce80d27ec7d1525f54a75fa679d76a66ed6143b17031287dea09!/taxon.txt!/L111764",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://rs.tdwg.org/dwc/terms/Taxon",
"http://rs.tdwg.org/dwc/text/id": "urn:lsid:marinespecies.org:taxname:242598",
"http://rs.tdwg.org/dwc/terms/scientificNameAuthorship": "(Linnaeus, 1758)",
"http://rs.tdwg.org/dwc/terms/nomenclaturalStatus": null,
"http://rs.tdwg.org/dwc/terms/acceptedNameUsage": "Enhydra lutris",
"http://rs.tdwg.org/dwc/terms/infraspecificEpithet": null,
"http://rs.tdwg.org/dwc/terms/taxonRank": "Species",
"http://rs.tdwg.org/dwc/terms/phylum": "Chordata",
"http://rs.tdwg.org/dwc/terms/scientificNameID": "urn:lsid:marinespecies.org:taxname:242598",
"http://rs.tdwg.org/dwc/terms/scientificName": "Enhydra lutris",
"http://rs.tdwg.org/dwc/terms/parentNameUsage": "Enhydra",
"http://rs.tdwg.org/dwc/terms/datasetName": "World Register of Marine Species (WoRMS)",
"http://purl.org/dc/terms/references": "https://www.marinespecies.org/aphia.php?p=taxdetails&id=242598",
"http://rs.tdwg.org/dwc/terms/subgenus": null,
"http://purl.org/dc/terms/rightsHolder": "WoRMS Editorial Board",
"http://rs.tdwg.org/dwc/terms/family": "Mustelidae",
"http://rs.tdwg.org/dwc/terms/order": "Carnivora",
"http://rs.tdwg.org/dwc/terms/class": "Mammalia",
"http://purl.org/dc/terms/bibliographicCitation": "WoRMS (2024). Enhydra lutris (Linnaeus, 1758). Accessed at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=242598",
"http://rs.tdwg.org/dwc/terms/genus": "Enhydra",
"http://purl.org/dc/terms/license": "http://creativecommons.org/licenses/by/4.0/",
"http://rs.tdwg.org/dwc/terms/namePublishedInYear": null,
"http://rs.tdwg.org/dwc/terms/institutionCode": "VLIZ",
"http://rs.tdwg.org/dwc/terms/acceptedNameUsageID": "urn:lsid:marinespecies.org:taxname:242598",
"http://rs.tdwg.org/dwc/terms/kingdom": "Animalia",
"http://rs.tdwg.org/dwc/terms/taxonID": "urn:lsid:marinespecies.org:taxname:242598",
"http://rs.tdwg.org/dwc/terms/parentNameUsageID": "urn:lsid:marinespecies.org:taxname:242597",
"http://rs.tdwg.org/dwc/terms/datasetID": "https://doi.org/10.14284/170",
"http://rs.tdwg.org/dwc/terms/nomenclaturalCode": "ICZN",
"http://rs.tdwg.org/dwc/terms/namePublishedInID": null,
"http://purl.org/dc/terms/modified": "2010-05-20",
"http://rs.tdwg.org/dwc/terms/taxonomicStatus": "accepted",
"http://rs.tdwg.org/dwc/terms/namePublishedIn": null,
"http://rs.tdwg.org/dwc/terms/specificEpithet": "lutris"
}
```
Note that the following command should also produce the *exact* same results:
preston cat –remote https://linker.bio,https://zenodo.org/record/12112610/files/hash://sha256/0a6625167b6943d31cfa737dd5f6f123c767d236194f6a1c97c44f678d9fcaa7| gunzip| grep “Enhydra lutris”| head -n1| jq . ~~~
and
preston cat --remote https://linker.bio,https://zenodo.org\
hash://sha256/0a6625167b6943d31cfa737dd5f6f123c767d236194f6a1c97c44f678d9fcaa7\
| gunzip\
| grep "Enhydra lutris"\
| head -n1\
| jq .
and
preston cat --remote https://linker.bio,https://zenodo.org\
hash://md5/aa54d0c3587526e3da167cf93afa541f\
| gunzip\
| grep "Enhydra lutris"\
| head -n1\
| jq .
Acknowledgements
This work stands on the shoulders of contributors to open source software and openly accessible datasets. Thank you!
References
Ahyong, S., C. B. Boyko, N. Bailly, J. Bernot, R. Bieler, S. N. Brandão, M. Daly, et al. 2024. “World Register of Marine Species (WoRMS).” WoRMS Editorial Board. https://www.marinespecies.org. https://doi.org/10.14284/170
Elliott, Michael J., Jorrit H. Poelen, and José A. B. Fortes. 2023. “Signing Data Citations Enables Data Verification and Citation Persistence.” Scientific Data 10 (1). https://doi.org/10.1038/s41597-023-02230-y.
Maniatis, Petros, Mema Roussopoulos, Thomas J Giuli, David SH Rosenthal, and Mary Baker. 2005. “The LOCKSS Peer-to-Peer Digital Preservation System.” ACM Transactions on Computer Systems (TOCS) 23 (1): 2–50.
Poelen, Jorrit H. (ed.). 2023. “Nomer Corpus of Taxonomic Resources hash://sha256 /12051b8aa59930d6561a3ed46b7cf3f67a31a98445a457d78 894f6b8a8e81641 hash://md5/1ff6b3628d7afc15b882cc0c9b1c3815.” Zenodo. https://doi.org/10.5281/zenodo.8326175.
Poelen, Jorrit, and José Augusto Salim. 2023. “Globalbioticinteractions/Nomer: 0.5.4.” Zenodo. https://doi.org/10.5281/zenodo.8329422.
创建时间:
2024-06-18



