Manually validated PageXML files for images in book "Lettres du sieur de Balzac"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14146751
下载链接
链接失效反馈官方服务:
资源简介:
Transcription of 57 pages of the 637-page printed book "Lettres du sieur de Balzac" by Jean-Louis de Balzac (1597-1654), publication date 1624. Transcription contains pages in PageXML format, useful for training an optical character recognition (OCR) model. The PageXML files were created by applying a public multilingual Transkribus model ("Transkribus print M1") on the images at https://github.com/Heresta/OCR17plus/blob/main/Data/Balzac1624_Lettres_btv1b86262420_corrected and by manually validating the result. During manual validation, the transcription was normalized (spelling adapted to modern French).
创建时间:
2024-11-13



