ICDAR2019 Post-OCR Text Correction
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-323.html
下载链接
链接失效反馈官方服务:
资源简介:
This original corpus consist in OCRed documents from 10 European languages with about 20M characters (3.5M tokens) aligned with their corresponding Gold Standard (Ground-Truth). Each language contain one or several sub-folders (unbalanced) according to collected dataset sources as follows: Each training file contain three blocs according to the following structure. Note that only the first block [OCR_output] will be included in the test set.
提供机构:
帕依提提



