HTR - Araucania - XIX manuscript
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7075074
下载链接
链接失效反馈官方服务:
资源简介:
General
Ground Truth dataset for Spanish 19th typewritten OCR (XML-ALTO)
The archives come from the events of the Occupation of Araucania (1850-1881) in Chile. They are archived in the 'Colección manuscritos' of the Archivo Central Andres Bello - Universidad de Chile.
Thereby, it is not possible to publicly distribute the images (.jpg)
To use them for segmentation/recognition model training, please contact me : archivo.central@uchile.cl
Methodology
Transcription rules :
- xxx for blurred or unreadable characters
- ^+letters for superscript letters
- ⁋ for new paragraph
Using the Kraken OCR engine in finetuning with the Menu_MacFrench template. A template uses the NFKD method.
Segmonto ontology
Evaluation
Name
Quantity (GT)
Val_acc
Test_acc
CER
WER
HTR-Araucania_XIX
180
0,90354
0,8673
0,05598
0.21423
HTR-Araucania_XIX_NFKD
180
0,89872
0,8563
0,06646
0.24963
Others
JSONL file for NER annotation in ner/ (MISC, LOC, PERS, ORG, DATE)
创建时间:
2022-09-13



