five

Dataset for ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/835488
下载链接
链接失效反馈
官方服务:
资源简介:
Train-A: Dataset of pages with manually revised baselines and the corresponding transcripts associated to them. This batch is small, 50 pages. Please, keep in mind that only the baselines have been manually corrected, The polygons associated to each line have not been manually reviewed. Train-B: Dataset of pages without any layout or text line information. The corresponding transcripts are provided at page level with line breaks. It has 10k pages, though for convenience it is divided into two 5k page batches. This information is provided in PAGE format. Test A: Dataset of pages with manually revised baselines. This batch has 65 pages. The polygons associated to each line have not been manually reviewed. Test-B1: The same dataset of pages of the Test A, but annotated only with the geometry of regions. Text line information is not provided.                                                    Test-B2: Dataset of page images annotated with the geometry of regions where to detect text line and recognize. It has 57 pages. Baseline.tgz: Baseline system trained using the first 40 pages of Train-A. The system is based on the deep learning toolkit to transcribe handwritten text images called Laia. More information at: https://scriptnet.iit.demokritos.gr/competitions/~icdar2017htr/
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作