five

Manually validated PageXML files for images in monography "Mémoire sur St Domingue par H ? M. Michel"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13784410
下载链接
链接失效反馈
官方服务:
资源简介:
Transcription of monography "Mémoire sur St Domingue par H ? M. Michel", dating from 1797 and dealing on slavery in Haiti (103 pages in total). Transcription contains 61 pages in PageXML format, useful for training a handwritten text recognition model. The PageXML files were created by applying a Transkribus model (French Model 1, see https://readcoop.eu/model/french-general-model/, or the non-public The Text Titan I) on the images at https://europeana.transcribathon.eu/documents/story/?story=12733. The PageXML output was automatically corrected using the flat-text manual transcription available with these images, and the resulting PageXML files were manually validated. The software for automatically correcting OCR output using flat-text manual transcriptions (and hence adding a link between image and text not present in the flat-text files) has been developed in the AI4Culture project (https://pro.europeana.eu/project/ai4culture-an-ai-platform-for-the-cultural-heritage-data-space). Note: transcriptions for pages 21, 22, 34 and 58 are not present yet.
创建时间:
2024-09-18
二维码
社区交流群
二维码
科研交流群
商业服务