A biodiversity dataset graph: Biological Associations in TaxonWorks hash://sha256/e4a47c067d6c125da60c9a1b92b5eecdea539cb8666cd3aed99db347ae5b8ed0 hash://md5/686007de79cc2a49ab23fd3debe56e3f
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8252843
下载链接
链接失效反馈官方服务:
资源简介:
The intended use of this archive is to facilitate (meta-)analysis of Biological Associations captured in TaxonWorks [1]. TaxonWorks is an integrated web-based workbench for taxonomists and biodiversity scientists. It allows you to capture, organize, and enrich your data; share it with collaborators; and package it for analysis and publication.
This dataset provides versioned snapshots of the TaxonWorks network as tracked by Preston [2,3,4] during 2024-05-07 using:
preston track -u https://sfg.taxonworks.org
. In addition, this dataset provides a processed version of the biological associations using the "preston tw-stream" command as generated by the following bash script:
#!/bin/bash
#
# Generates GloBI interaction JSON Lines from provided provenance log as generated by preston tw-stream.
#
/usr/local/bin/preston cat hash://sha256/c1b081afa6ea0f60570c24cca85c4d9acd91eeefe36b9cacd1fe53b6893ea154\
| /usr/local/bin/preston tw-stream
The script itself was executed using:
cat transform.sh | preston bash
The execution of this transform.sh script (with content id hash://sha256/6dfe3c4ebf877bed73aebbe88c7d388bf894c569578ed7b28ca68e57a6afe43b), as well as their results, is captured within this datasets also. A rdf/quads formatted machine readable version of the workflow execution description can be found via:
preston cat hash://sha256/e4a47c067d6c125da60c9a1b92b5eecdea539cb8666cd3aed99db347ae5b8ed0
And, the resulting JSON Lines file has content id (or signature) hash://sha256/4c2b8642251ced5985660d63c565efa6e5a9bf3d12b3b0c0d9ac577905f5e897 and is also included as interactions.json to facilitate access.
The first json record can be generated using:
preston cat hash://sha256/4c2b8642251ced5985660d63c565efa6e5a9bf3d12b3b0c0d9ac577905f5e897\
| head -n1\
| jq .
or, provided that the interactions.json has content id starting with hash://sha256/4c2b86...
cat interactions.json\
| head -n1\
| jq .
This produces the following (formatted) json object:
{ "http://www.w3.org/ns/prov#wasDerivedFrom": "hash://sha256/fdbf13dc5f3d9c5afbc03db62699e2ce2724c499b7d91d8b0bf31e39409b153a", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "application/vnd.taxonworks+json", "referenceId": "https://sfg.taxonworks.org/api/v1/sources/213218", "interactionId": "https://sfg.taxonworks.org/api/v1/biological_associations/227664", "taxonRootsResolved": 2, "referenceResolved": true, "referenceCitation": "@article{213218,\n author = {Monzen, Kota},\n journal = {Annual Report of the Gakugei Faculty of the Iwate University},\n pages = {24-38},\n title = {Revision of the Japanese gall wasps with the descriptions of new genus, subgenus, species and subspecies (II). Cynipidae (Cynipinae) Hymenoptera.},\n volume = {6},\n year = {1954}\n}\n", "interactionTypeId": "gid://taxon-works/BiologicalRelationship/69", "interactionTypeName": "gall", "sourceTaxonName": "Neuroterus hakonensis", "sourceTaxonId": "gid://taxon-works/TaxonName/1174121", "sourceTaxonRank": "species", "sourceTaxonAuthorship": "Ashmead, 1904", "sourceTaxonPath": "Root | Cynipidae | Neuroterus | Neuroterus hakonensis", "sourceTaxonPathIds": "gid://taxon-works/TaxonName/623170 | gid://taxon-works/TaxonName/1170060 | gid://taxon-works/TaxonName/1170097 | gid://taxon-works/TaxonName/1174121", "sourceTaxonPathNames": "nomenclatural rank | family | genus | species", "targetTaxonName": "Quercus", "targetTaxonId": "gid://taxon-works/TaxonName/1173543", "targetTaxonRank": "genus", "targetTaxonAuthorship": "", "targetTaxonPath": "Root | Fagaceae | Quercus", "targetTaxonPathIds": "gid://taxon-works/TaxonName/623170 | gid://taxon-works/TaxonName/1173542 | gid://taxon-works/TaxonName/1173543", "targetTaxonPathNames": "nomenclatural rank | family | genus"}
In this example, a claim is made that, according to https://sfg.taxonworks.org/api/v1/sources/213218 [6] Neuroterus hakonensis (a gall wasp) has a primary host in the genus of Quercus (oak tree).
In total, 237,068 such claims can be found in the generated resource with alias interactions.json and content id starting with hash://sha256/4c2b86... .
In addition, the archive preston.tar.gz to allow for batch download. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the TaxonWorks content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .
To retrieve and verify the downloaded TaxonWorks biodiversity dataset graph, download preston.tar.gz. Then, extract the archive into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:
java -jar preston.jar clone --remote https://zenodo.org/record/11151783/files
After that, verify the index of the archive by reproducing the following provenance log history:
java -jar preston.jar history --log tsv
to be:
hash://sha256/e4a47c067d6c125da60c9a1b92b5eecdea539cb8666cd3aed99db347ae5b8ed0 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/c1b081afa6ea0f60570c24cca85c4d9acd91eeefe36b9cacd1fe53b6893ea154 hash://sha256/c1b081afa6ea0f60570c24cca85c4d9acd91eeefe36b9cacd1fe53b6893ea154 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/a4d651aac5220487835e6178511886e98b845b2d98cb7c5447fb2b042e0654d2hash://sha256/a4d651aac5220487835e6178511886e98b845b2d98cb7c5447fb2b042e0654d2 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/ab7550368905e7c919e70a306efbb97719a1edbba2cfe4c4515f635ebc0be4bb
hash://sha256/a4d651aac5220487835e6178511886e98b845b2d98cb7c5447fb2b042e0654d2 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/ab7550368905e7c919e70a306efbb97719a1edbba2cfe4c4515f635ebc0be4bbhash://sha256/ab7550368905e7c919e70a306efbb97719a1edbba2cfe4c4515f635ebc0be4bb http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/ff5e709305e593c87711e897b6341b94e775e2f312aa6d4ae5ed6120babd6f5e
urn:uuid:0659a54f-b713-4f86-a917-5be166a14110 http://purl.org/pav/hasVersion hash://sha256/ff5e709305e593c87711e897b6341b94e775e2f312aa6d4ae5ed6120babd6f5e
To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.
java -jar preston.jar verify
Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".
Files in this data publication:
--- start of file descriptions ---
-- description of archive and its contents (a rendition of this file) --README
-- biological associations indexed from TaxonWorks expressed in a GloBI [5] compatible JSON Lines file --interactions.json
-- first 10 biological associations indexed from TaxonWorks expressed in a GloBI [5] compatible JSON Lines file --interactions-10.json
-- executable java jar containing preston [2,3,4] v0.8.5-SNAPSHOT. --preston.jar
-- preston archive containing TaxonWorks data files, associated provenance logs and a provenance index --preston.tar.gz
-- individual provenance index files --
1fed32bf78298d7ecc3d9f36d106f1d7d7773a8b9a5e47af6632f36c1f82adb529306c5c144c3d7fd21be344d8b6b554b6f6efa3b8f8f5c0b27cdf0e887856522a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5ad31ff1ef1dea88c5952181a4f30e7ea7862873aa5f66430451275aa6d08d329edeb84d69224af488da585186f88cafc58e978db5f9897de624cc9b02c0c83742e9c34683f1e826f68f841f3419bd5ee9c0fa18be04713a6fd3364f226c7c5f2ff98d36a9dc7bd833c93b3b61130865628f7bc2f7bb0920e95afcd16fba3dc6a8ffb41d48979ceb964fbfbeb68cb60b584b759950087fdcc012521b866249bc39
--- end of file descriptions ---
This work is funded in part by grant NSF OAC 1839201, NSF DBI 1901932, NSF DBI 1901926, and NSF DBI 2102006 from the National Science Foundation.
创建时间:
2024-05-08



