M-POPP
收藏arXiv2025-09-30 收录
下载链接:
https://zenodo.org/records/11296970
下载链接
链接失效反馈官方服务:
资源简介:
该数据集专为全文手写文本识别和信息提取而设计,特别关注了1880年至1940年巴黎的婚姻记录。它包含了超过一百种不同的书写风格,并专注于为手写文本识别(HTR)及手写文本识别加信息提取(HTR+IE)特定的注释块。该数据集的规模覆盖了多种婚姻记录,其任务旨在进行手写文本识别和信息提取。
This dataset is designed for full-text handwritten text recognition and information extraction, with a specific focus on Parisian marriage records spanning 1880 to 1940. It includes over 100 distinct writing styles, and provides dedicated annotation blocks tailored for handwritten text recognition (HTR) and the combined task of handwritten text recognition plus information extraction (HTR+IE). The dataset covers a diverse range of marriage records, with its core tasks targeting handwritten text recognition and information extraction.
提供机构:
Available on Zenodo



