Manually validated PageXML files for images in "Diario del soldato Bruno Celestino"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13760585
下载链接
链接失效反馈官方服务:
资源简介:
Transcribed diary of Italian soldier Bruno Celestino in World War I (52 pages in total) in PageXML format (pages 2 to 49 were transcribed). These files are useful for training a handwritten text recognition model. The PageXML files were created by applying Transkribus' Italian Handwriting M1 model (https://readcoop.eu/model/italian-general-model/) on the images at https://europeana.transcribathon.eu/documents/story/?story=110659, automatically correcting the output using the flat-text manual transcription available with these images, and manually validating the resulting PageXML files. The software for automatically correcting OCR output using flat-text manual transcriptions (and hence adding a link between image and text not present in the flat-text files) has been developed as part of the AI4Culture project (https://pro.europeana.eu/project/ai4culture-an-ai-platform-for-the-cultural-heritage-data-space).
创建时间:
2024-09-13



