Citations with contexts in Wikipedia
收藏Figshare2017-12-01 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Citations_with_contexts_in_Wikipedia/5588842/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset represents <b>structured metadata and contextual information about references added to Wikipedia articles</b> in a JSON format. <br>Each record represents an individual Wikipedia article revision with all the tags parsed, as stored in Wikipedia's XML dumps, including information about: <br>1) the context(s) in which the reference occurs within the article – such as the surrounding text, parent section title, and section level – <br>2) structured data and bibliographic metadata included within the reference itself (such as: any citation template used, external links, any known persistent identifiers) <br>3) additional data/metadata about the reference itself (the reference name, its raw content, and if applicable, revision ID associated with reference addition/deletion/change)<br>The data is available as a set of compressed JSON files, extracted from the July 1, 2017 XML dump of English Wikipedia. Other languages may be added to this dataset in the future.<br>The JSON schema and Python parsing libraries used to generate the data are in the references.
本数据集以JSON格式存储了**维基百科条目引用的结构化元数据与上下文信息**。每条记录对应一份已解析所有标签的独立维基百科条目修订版本,数据源自维基百科的XML转储文件,涵盖以下三类信息:
1. 引用在条目中的出现上下文:包括周边文本、所属章节标题以及章节层级;
2. 引用本身包含的结构化数据与文献著录元数据:例如所使用的引用模板、外部链接,以及各类已知的永久标识符;
3. 引用自身的额外数据/元数据:包括引用名称、原始内容,以及(如适用)与引用添加、删除或修改相关的修订ID。
本数据集由2017年7月1日的英文维基百科XML转储文件提取而来,以一系列压缩JSON文件的形式对外提供。未来或将新增其他语言版本的数据集。
用于生成该数据集的JSON Schema与Python解析库已收录于参考文献中。
提供机构:
Meen Chul Kim
创建时间:
2017-12-01



