five

Wikidata Taxon Items in JSON Lines Format hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c hash://md5/bdcc99bfedfd34abdfdd3802182f225c

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12535890
下载链接
链接失效反馈
官方服务:
资源简介:
Wikidata contains information about taxonomic names, and these taxonomic names are key to integrating biodiversity datasets across different platforms, datasets and institutions.  Content filename/alias content ids wikidata-taxon.json.bz2 hash://sha256/701a1382e304a6b1bb38fe828d82f7b8b562c77f918f33097966e38bacf0b2e7 hash://md5/d5bad3553470506f3bde383566a5dea3 wikidata-taxa.sh hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962 hash://md5/1d80083d498d61c8b63fbd46d51f7c5c Q140.json (example) hash://md5/44ab0031091fb96caa063e3fe41a85f2 Provenance for humans This dataset contains a subset of Wikidata items referencing the taxonomic name concept https://www.wikidata.org/wiki/Q16521 and is expressed in JSON Lines format.  An example of such item is https://wikidata.org/wiki/Q140, an item that describes the taxonomic name associated with Panthera leo, commonly known as Lion (English), León (Spanish), or 狮子 (Chinese). You can find a "pretty" printed example of Q140 in the file "Q140.json" included in this publication. The first 10 lines of "Q140.json" are shown below:  { "type": "item", "id": "Q140", "labels": { "fr": { "language": "fr", "value": "lion" }, "it": { "language": "it", ...   The reason for creating a wikidata subset is because all of wikidata (~85G) didn't fit in Zenodo.  for machines This dataset was generated using the script below with content id hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962  or hash://md5/1d80083d498d61c8b63fbd46d51f7c5c 1 #!/bin/bash 2 # 3 # streams Wikidata taxon items (or items containing https://www.wikidata.org/wiki/Q16521) 4 # from latest data dump in line json (one json object per line) 5 # 6 curl --silent "https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2"\ 7 | bunzip2\ 8 | grep -E "Q16521[^0-9]"\ 9 | sed 's/,$//g'\ 10 | bzip2 The script first downloads a recent copy of all wikidata entities in bzip2 compressed format (line 6), decompresses them (line 7), selects only lines containing "Q16521" (line 8), removes any trailing commas (line 9), and recompresses the output. With this, the output contains wikidata items/entities as described earlier. Preston, a biodiversity data tracker, was used to (a) track the script, as well as (b) recording a script execution and (c) tracking the outcome by running : #!/bin/bash # # run the script with id hash://sha256/13ff... # preston bash\ --remote https://linker.bio\ -c "hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c" The recording of this process is identified with hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c and hash://md5/bdcc99bfedfd34abdfdd3802182f225c , and can be reconstructed using  preston ls\ --remote https://linker.bio/,https://zenodo.org/records/13920038/files\ --anchor hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c Which is expected to produce: . . "Preston is a software program that finds, archives and provides access to biodiversity datasets."@en . . "Executes script and captures stdout"@en . "2024-10-10T16:37:36.659Z"^^ . . . . "Jorrit Poelen, Icaro Alzuru, & Michael Elliott. 2018-2024. Preston: a biodiversity dataset tracker (Version 0.9.9-SNAPSHOT) [Software]. Zenodo. https://doi.org/10.5281/zenodo.1410543"@en . . "A biodiversity dataset graph archive."@en . . "text/x-shellscript" . . . . . "2024-10-11T02:08:03.739Z"^^ . . . . . . "Preston is a software program that finds, archives and provides access to biodiversity datasets."@en . . "Executes script and captures stdout"@en . "2024-06-22T10:40:12.016Z"^^ . . . . "Jorrit Poelen, Icaro Alzuru, & Michael Elliott. 2021. Preston: a biodiversity dataset tracker (Version 0.8.4) [Software]. Zenodo. https://doi.org/10.5281/zenodo.1410543"@en . . "A biodiversity dataset graph archive."@en . "text/x-shellscript" . . . . . "2024-06-22T19:01:55.863Z"^^ . . . .
创建时间:
2024-10-11
二维码
社区交流群
二维码
科研交流群
商业服务