Wikidata Taxon Items in JSON Lines Format hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c hash://md5/bdcc99bfedfd34abdfdd3802182f225c
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12535890
下载链接
链接失效反馈官方服务:
资源简介:
Wikidata contains information about taxonomic names, and these taxonomic names are key to integrating biodiversity datasets across different platforms, datasets and institutions.
Content
filename/alias
content ids
wikidata-taxon.json.bz2
hash://sha256/701a1382e304a6b1bb38fe828d82f7b8b562c77f918f33097966e38bacf0b2e7
hash://md5/d5bad3553470506f3bde383566a5dea3
wikidata-taxa.sh
hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962
hash://md5/1d80083d498d61c8b63fbd46d51f7c5c
Q140.json (example)
hash://md5/44ab0031091fb96caa063e3fe41a85f2
Provenance
for humans
This dataset contains a subset of Wikidata items referencing the taxonomic name concept https://www.wikidata.org/wiki/Q16521 and is expressed in JSON Lines format.
An example of such item is https://wikidata.org/wiki/Q140, an item that describes the taxonomic name associated with Panthera leo, commonly known as Lion (English), León (Spanish), or 狮子 (Chinese). You can find a "pretty" printed example of Q140 in the file "Q140.json" included in this publication. The first 10 lines of "Q140.json" are shown below:
{
"type": "item",
"id": "Q140",
"labels": {
"fr": {
"language": "fr",
"value": "lion"
},
"it": {
"language": "it",
...
The reason for creating a wikidata subset is because all of wikidata (~85G) didn't fit in Zenodo.
for machines
This dataset was generated using the script below with content id hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962 or hash://md5/1d80083d498d61c8b63fbd46d51f7c5c
1 #!/bin/bash
2 #
3 # streams Wikidata taxon items (or items containing https://www.wikidata.org/wiki/Q16521)
4 # from latest data dump in line json (one json object per line)
5 #
6 curl --silent "https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2"\
7 | bunzip2\
8 | grep -E "Q16521[^0-9]"\
9 | sed 's/,$//g'\
10 | bzip2
The script first downloads a recent copy of all wikidata entities in bzip2 compressed format (line 6), decompresses them (line 7), selects only lines containing "Q16521" (line 8), removes any trailing commas (line 9), and recompresses the output. With this, the output contains wikidata items/entities as described earlier.
Preston, a biodiversity data tracker, was used to (a) track the script, as well as (b) recording a script execution and (c) tracking the outcome by running :
#!/bin/bash
#
# run the script with id hash://sha256/13ff...
#
preston bash\
--remote https://linker.bio\
-c "hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c"
The recording of this process is identified with hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c and hash://md5/bdcc99bfedfd34abdfdd3802182f225c , and can be reconstructed using
preston ls\
--remote https://linker.bio/,https://zenodo.org/records/13920038/files\
--anchor hash://sha256/13ffa9679bae381aa5914d810638fb5a0c75d71f5f7d47f38b3c00d750c88b9c
Which is expected to produce:
. . "Preston is a software program that finds, archives and provides access to biodiversity datasets."@en . . "Executes script and captures stdout"@en . "2024-10-10T16:37:36.659Z"^^ . . . . "Jorrit Poelen, Icaro Alzuru, & Michael Elliott. 2018-2024. Preston: a biodiversity dataset tracker (Version 0.9.9-SNAPSHOT) [Software]. Zenodo. https://doi.org/10.5281/zenodo.1410543"@en . . "A biodiversity dataset graph archive."@en . . "text/x-shellscript" . . . . . "2024-10-11T02:08:03.739Z"^^ . . . . .
.
"Preston is a software program that finds, archives and provides access to biodiversity datasets."@en .
.
"Executes script and captures stdout"@en .
"2024-06-22T10:40:12.016Z"^^ .
.
.
.
"Jorrit Poelen, Icaro Alzuru, & Michael Elliott. 2021. Preston: a biodiversity dataset tracker (Version 0.8.4) [Software]. Zenodo. https://doi.org/10.5281/zenodo.1410543"@en .
.
"A biodiversity dataset graph archive."@en .
"text/x-shellscript" .
.
.
.
.
"2024-06-22T19:01:55.863Z"^^ .
.
.
.
创建时间:
2024-10-11



