Wikidata dump extension (enwiki section links)
收藏Mendeley Data2024-05-10 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/7360787
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains mappings between Wikidata entities and Wikipedia sections. The mappings come in addition to the existing Wikidata sitelinks referencing Wikipedia pages. The creation of the present dataset stems from the observation that only a fraction of Wikidata entities has a corresponding Wikipedia article in any language (we refer to the remaining entities, without an article, as orphans). However, a substantial number of orphan entities are indeed available in Wikipedia, but not at the page level; orphan entities can be described within existing Wikipedia articles in the form of sections, subsections, and paragraphs of a more generic concept or fact. The dataset provides a fine-grained mapping between Wikidata orphan entities and Wikipedia (sub)-sections. Mappings are provided for English language. The dataset is available in JSON and RDF formats and complies with the Wikibase data model. In the JSON representation, an entity contains two fields: id (the unique identifier of an entity) and sectionlinks (links to Wikipedia sections). Each sectionlink record comprises a list of records1 with three fields: site, title, and url. A section title is appended to the page title separated with # symbol. Such a compound title is then URL-encoded and added to the URL path. Following the Wikidata guidelines, each entity is encoded as a single line. Example: {
"id": "Q715509",
"sectionlinks": {
"enwiki": [
{
"site": "enwiki",
"title": "Places in Harry Potter#Azkaban",
"url": "https://en.wikipedia.org/wiki/Places_in_Harry_Potter#Azkaban"
}
],
}
} The RDF dump is serialized using the Turtle format and stores nodes describing Wikipedia links. Section titles are added in the same manner as described above. Example: <https://en.wikipedia.org/wiki/Places_in_Harry_Potter#Azkaban> a schema:Article ;
schema:about wd:Q715509 ;
schema:inLanguage "en" ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name "Places in Harry Potter#Azkaban"@en .
<https://en.wikipedia.org/> wikibase:wikiGroup "wikipedia" .
1 As opposed to sitelinks, where each entity can be mapped with a unique Wikipedia page (one-to-one mapping), in sectionlinks we allow a one-to-many mapping, i.e., an entity can be mapped to multiple sections. For example, Tennis racket concept can be mapped to Tennis#Rackets and Racket (sports equipment)#Tennis sections.
创建时间:
2023-06-28
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是维基数据的一个扩展,专门提供维基数据实体与英文维基百科章节之间的细粒度映射关系,特别关注那些没有独立维基百科文章的'孤儿'实体。它支持一个实体映射到多个章节,并以JSON和RDF格式提供,遵循维基基地数据模型。
以上内容由遇见数据集搜集并总结生成



