five

Wikidata Taxon Items in JSON Lines Format hash://sha256/e76276c283090381fc4b3efe28fc61c28f5bf03db0f3743f7178b999ebccada2 hash://md5/967c79ea605fda781129273a9f229eac

收藏
Mendeley Data2024-06-29 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/12535891
下载链接
链接失效反馈
官方服务:
资源简介:
Wikidata contains information about taxonomic names, and these taxonomic names are key to integrating biodiversity datasets across different platforms, datasets and institutions. Content filename/alias content ids wikidata-taxon.json.bz2 hash://sha256/a7592b72c9013d67d655b6ea5d1f4f67f2057dc5b5ee52578a07f58fea835580 hash://md5/695340d0a730865f3eadd2a21b0fd1e9 wikidata-taxa.sh hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962 hash://md5/1d80083d498d61c8b63fbd46d51f7c5c Q140.json (example) hash://md5/512e9a7c93f1142515627e483d460a95 Provenance for humans This dataset contains a subset of Wikidata items referencing the taxonomic name concept https://www.wikidata.org/wiki/Q16521 and is expressed in JSON Lines format. An example of such item is https://wikidata.org/wiki/Q140, an item that describes the taxonomic name associated with Panthera leo, commonly known as Lion (English), León (Spanish), or 狮子 (Chinese). You can find a "pretty" printed example of Q140 in the file "Q140.json" included in this publication. The first 10 lines of "Q140.json" are shown below: { "type": "item", "id": "Q140", "labels": { "fr": { "language": "fr", "value": "lion" }, "it": { "language": "it", ... The reason for creating a wikidata subset is because all of wikidata (~85G) didn't fit in Zenodo. for machines This dataset was generated using the script below with content id hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962 or hash://md5/1d80083d498d61c8b63fbd46d51f7c5c 1 #!/bin/bash 2 # 3 # streams Wikidata taxon items (or items containing https://www.wikidata.org/wiki/Q16521) 4 # from latest data dump in line json (one json object per line) 5 # 6 curl --silent "https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2"\ 7 | bunzip2\ 8 | grep -E "Q16521[^0-9]"\ 9 | sed 's/,$//g'\ 10 | bzip2 The script first downloads a recent copy of all wikidata entities in bzip2 compressed format (line 6), decompresses them (line 7), selects only lines containing "Q16521" (line 8), removes any trailing commas (line 9), and recompresses the output. With this, the output contains wikidata items/entities as described earlier. Preston, a biodiversity data tracker, was used to (a) track the script, as well as (b) recording a script execution and (c) tracking the outcome by running : #!/bin/bash # # run the script with id hash://sha256/6f4f... # preston bash\ --remote https://linker.bio\ -c "hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962" The recording of this process is identified with hash://sha256/e76276c283090381fc4b3efe28fc61c28f5bf03db0f3743f7178b999ebccada2 and hash://md5/967c79ea605fda781129273a9f229eac , and can be reconstructed using preston ls\ --remote https://linker.bio/,https://zenodo.org/records/12535891/files\ --anchor hash://sha256/e76276c283090381fc4b3efe28fc61c28f5bf03db0f3743f7178b999ebccada2 Which is expected to produce: <https://preston.guoda.bio> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#SoftwareAgent> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <https://preston.guoda.bio> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Agent> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <https://preston.guoda.bio> <http://purl.org/dc/terms/description> "Preston is a software program that finds, archives and provides access to biodiversity datasets."@en <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> <http://purl.org/dc/terms/description> "Executes script and captures stdout"@en <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> <http://www.w3.org/ns/prov#startedAtTime> "2024-06-22T10:40:12.016Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> <http://www.w3.org/ns/prov#wasStartedBy> <https://preston.guoda.bio> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <https://doi.org/10.5281/zenodo.1410543> <http://www.w3.org/ns/prov#usedBy> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <https://doi.org/10.5281/zenodo.1410543> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Software> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <https://doi.org/10.5281/zenodo.1410543> <http://purl.org/dc/terms/bibliographicCitation> "Jorrit Poelen, Icaro Alzuru, & Michael Elliott. 2021. Preston: a biodiversity dataset tracker (Version 0.8.4) [Software]. Zenodo. https://doi.org/10.5281/zenodo.1410543"@en <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Entity> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/dc/terms/description> "A biodiversity dataset graph archive."@en <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962> <http://purl.org/dc/elements/1.1/format> "text/x-shellscript" . <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> <http://www.w3.org/ns/prov#used> <hash://sha256/6f4fac44054d54ec3006d091ba702f872b3f4d013628add98fcca08a3b768962> . <urn:uuid:6fa51a99-a137-4387-90b1-23589d7b60ae> <http://www.w3.org/ns/prov#wasGeneratedBy> <urn:uuid:096ba92f-9d5c-4cb1-9a3d-7a95c5228758> . <hash://sha256/a7592b72c9013d67d655b6ea5d1f4f67f2057dc5b5ee52578a07f58fea835580> <http://www.w3.org/ns/prov#wasGeneratedBy> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> . <hash://sha256/a7592b72c9013d67d655b6ea5d1f4f67f2057dc5b5ee52578a07f58fea835580> <http://www.w3.org/ns/prov#qualifiedGeneration> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> . <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> <http://www.w3.org/ns/prov#generatedAtTime> "2024-06-22T19:01:55.863Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> . <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Generation> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> . <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> <http://www.w3.org/ns/prov#used> <urn:uuid:6fa51a99-a137-4387-90b1-23589d7b60ae> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> . <urn:uuid:6fa51a99-a137-4387-90b1-23589d7b60ae> <http://purl.org/pav/hasVersion> <hash://sha256/a7592b72c9013d67d655b6ea5d1f4f67f2057dc5b5ee52578a07f58fea835580> <urn:uuid:e0d76a06-5241-4dfe-8429-46164190ab0e> .
创建时间:
2024-06-27
二维码
社区交流群
二维码
科研交流群
商业服务