five

Handbooks prepared by the Historical Section of the Foreign Office

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14827701
下载链接
链接失效反馈
官方服务:
资源简介:
The PeaceBooks corpus is a set of volumes with historical, ethnographic, cartographic, and economic information on almost every part of the world prepared by the Historical Section of the Foreign Office for use by the British delegates to the Paris Peace Conference. The PeaceBooks corpus is composed of texts that were published in 25 volumes in 1920. The copies that make up this corpus were digitized by Goole and were provided by the University of Iowa (16), University of Michigan (8), and one volume by the University of Wisconsin (volume 6 covered France, Italy, and Spain). The orginal files were downloaded from HathiTrust. Typographical errors and OCR mistakes were corrected. Different peritext elements, such as volume titles, chapter and section headings, footnotes, page numbers, and titles in text of all volumes, were annotated. The structural elements were tagged by one or multiple hash signs '#', reflecting the hierarchicals structure of the volume, footnotes start with  tilda sign '~'. The work is in public domain. It was idigitized and OCR-ed by Google. The file peace_books_ids.tsv contains the identifiers for the source files in HathiTrust, as well as metadata for the individual files (explanation of the columns are in the related GitHub repository).
创建时间:
2025-03-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作