five

Wikidata dump extension (sitelinks)

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3840621
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains mappings between Wikidata entities and Wikipedia sections. The mappings come in addition to the existing Wikidata sitelinks referencing Wikipedia pages. The creation of the present dataset stems from the observation that only a fraction of Wikidata entities has a corresponding Wikipedia article in any language (we refer to the remaining entities, without an article, as orphans). However, a substantial number of orphan entities are indeed available in Wikipedia, but not at the page level; orphan entities can be described within existing Wikipedia articles in the form of sections, subsections, and paragraphs of a more generic concept or fact. The dataset provides a fine-grained mapping between Wikidata orphan entities and Wikipedia (sub)-sections. Mappings are provided for 15 languages: Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish, Ukrainian, Vietnamese. The dataset is available in JSON and RDF formats and complies with the Wikibase data model. In the JSON representation, an entity contains two fields: id (the unique identifier of an entity) and sitelinks (links to Wikipedia pages). Each sitelink record comprises three fields: site, title, and url. A section title is appended to the page title separated with # symbol. Such a compound title is then URL-encoded and added to the URL path. Following the Wikidata guidelines, each entity is encoded as a single line. Example: { "id": "Q3320792", "sitelinks": { "dewiki": { "site": "dewiki", "title": "Orestie#Agamemnon", "url": "https://de.wikipedia.org/wiki/Orestie#Agamemnon" }, "frwiki": { "site": "frwiki", "title": "Orestie#''Agamemnon''", "url": "https://fr.wikipedia.org/wiki/Orestie#Agamemnon" }, "enwiki": { "site": "enwiki", "title": "Aeschylus#''Agamemnon''", "url": "https://en.wikipedia.org/wiki/Aeschylus#Agamemnon" } } }   The RDF dump is serialized using the Turtle format and stores nodes describing Wikipedia links. Section titles are added in the same manner as described above. Example: a schema:Article ; schema:about wd:Q3320792 ; schema:inLanguage "de" ; schema:isPartOf ; schema:name "Orestie#Agamemnon"@de . wikibase:wikiGroup "wikipedia" . a schema:Article ; schema:about wd:Q3320792 ; schema:inLanguage "fr" ; schema:isPartOf ; schema:name "Orestie#''Agamemnon''"@fr . wikibase:wikiGroup "wikipedia" . a schema:Article ; schema:about wd:Q3320792 ; schema:inLanguage "en" ; schema:isPartOf ; schema:name "Aeschylus#''Agamemnon''"@en . wikibase:wikiGroup "wikipedia" .
创建时间:
2020-06-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作