ICDAR2019 Post-OCR Text Correction

Name: ICDAR2019 Post-OCR Text Correction
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-323.html

下载链接

链接失效反馈

官方服务：

资源简介：

This original corpus consist in OCRed documents from 10 European languages with about 20M characters (3.5M tokens) aligned with their corresponding Gold Standard (Ground-Truth). Each language contain one or several sub-folders (unbalanced) according to collected dataset sources as follows: Each training file contain three blocs according to the following structure. Note that only the first block [OCR_output] will be included in the test set.

提供机构：

帕依提提

5,000+

优质数据集

54 个

任务类型

进入经典数据集