five

ICDAR2019 Post-OCR Text Correction

收藏
帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-323.html
下载链接
链接失效反馈
官方服务:
资源简介:
This original corpus consist in OCRed documents from 10 European languages with about 20M characters (3.5M tokens) aligned with their corresponding Gold Standard (Ground-Truth). Each language contain one or several sub-folders (unbalanced) according to collected dataset sources as follows: Each training file contain three blocs according to the following structure. Note that only the first block [OCR_output] will be included in the test set.
提供机构:
帕依提提
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作