A biodiversity dataset graph: BHL
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/3484554
下载链接
链接失效反馈官方服务:
资源简介:
A biodiversity dataset graph: BHL
The intended use of this archive is to facilitate (meta-)analysis of the Biodiversity Heritage Library (BHL). The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
This dataset provides versioned snapshots of the BHL network as tracked by Preston [2] between 2019-05-13 and 2019-10-07 using "preston update -u https://biodiversitylibrary.org".
The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the BHL content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .
To retrieve and verify the downloaded BHL biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:
$ java -jar preston.jar clone --remote https://zenodo.org/record/3484555/files
After that, verify the index of the archive by reproducing the following provenance log history:
$ java -jar preston.jar history
<0659a54f-b713-4f86-a917-5be166a14110> .
.
.
.
.
.
.
To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.
$ java -jar preston.jar verify
hash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca file:/home/preston/preston-bhl/data/e0/c1/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca OK CONTENT_PRESENT_VALID_HASH 49458087
hash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99 file:/home/preston/preston-bhl/data/1a/57/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99 OK CONTENT_PRESENT_VALID_HASH 25745
hash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c file:/home/preston/preston-bhl/data/85/ef/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c OK CONTENT_PRESENT_VALID_HASH 519892
hash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743 file:/home/preston/preston-bhl/data/25/1e/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743 OK CONTENT_PRESENT_VALID_HASH 787414
Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".
Files in this data publication:
--- start of file descriptions ---
-- description of archive and its contents (this file) --
README
-- executable java jar containing preston[2] v0.1.8. --
preston.jar
-- preston archives containing BHL data files, associated provenance logs and a provenance index --
preston-[00-ff].tar.gz
-- individual provenance index files --
2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
2b1104cb7749e818c9afca78391b2d0099bbb0a32f2b348860a335cd2f8f6800
4081bc59dff58d63f6a86c623cb770f01e9a355a42495b205bcb538cd526190f
6f99a1388823fca745c9e22ac21e2da909a219aa1ace55170fa9248c0276903c
82903464889fea7c53f53daedf4e41fa31092f82619edeb3415eb2b473f74af3
9e8c86243df39dd4fe82a3f814710eccf73aa9291d050415408e346fa2b09e70
bcec6df2ea7f74e9a6e2830d0072e6b2fbe65323d9ddb022dd6e1349c23996e2
--- end of file descriptions ---
References
[1] Biodiversity Heritage Library (BHL, https://biodiversitylibrary.org) accessed from 2019-05-13 to 2019-10-07 with provenance hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1.
[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .
This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.
创建时间:
2023-06-02



