EB-KG: Knowledge Graph of the first 8 eiditions Encyclopaedia Brittanica (1768-1860)
收藏Zenodo2022-08-25 更新2026-05-26 收录
下载链接:
https://zenodo.org/record/6673897
下载链接
链接失效反馈官方服务:
资源简介:
This Knowlege Graph represents the information of the first eight editions of Encyclopaedia Brittanica (years: 1768 to 1860) in RDF (ttl format). The raw dataset is provided by the NLS in this link , and it comprises of eight editions and a total of 195 volumes with a total size of 44GB. It uses two XMLs schemas: METS for descriptive, structural, technical and administrative metadata (Title, Author, Publisher, etc); and ALTO for encoding the OCR text of a page. In this work, we have extracted the information from METS and ALTO XMLS using defoe tool and developed novel information extraction heuristics. With the extracted information, we created the EB-KG Knowlege Graph, which uses the EB Ontolgy, to represent such information. Furthermore, during the information extraction phase, we have employed several techniques to mitigate two common OCR errors: long-S and the line-break hyphenation. The EB-KG contains 1,638,239 RDF triples. It has information from 8 editions. Each edition can have several Volumes, references to Books, Supplements; it also has an Editor and a Publisher, which can be a Person or an Organization. A Volume has several Pages, which can contain several Terms. And a Term can be either a Topic (a term described across several pages, often combining text, pictures, and tables.) or an Article (a description of the term in one- or two-paragraph long text (similar to an entry in a dictionary)). The data model of the EB-KG can be found here. The original ALTO files do not indicate the start and end of each EB term, the first part of our work involved the<br> automated extraction of all terms (along with their metadata) across editions, so they can be analysed independently without the surrounding text.
提供机构:
Zenodo
创建时间:
2022-06-21



