222,522张中文手写体OCR数据
收藏数据堂2024-05-23 收录
下载链接:
https://www.datatang.com/dataset/1333
下载链接
链接失效反馈官方服务:
资源简介:
222,522张中文手写体OCR数据。书写环境包括A4纸、方格纸、横格纸、白板、彩色便签和答题卡等。书写内容包括诗歌、散文、店铺活动通知、祝福语、心愿单、摘抄文本、作文和笔记等。数据多样性包括多种书写纸张、多种字体、多种书写内容、多种采集角度。采集角度为平视和仰视。在标注方面,标注行/列级文本的四边形框,行/列级文本转写。本套数据可用于中文手写体OCR任务。
This dataset consists of 222,522 Chinese handwritten text samples for optical character recognition (OCR) tasks. The writing substrates include A4 paper, grid paper, ruled paper, whiteboards, colorful sticky notes, answer sheets, and other common writing surfaces. The written content covers poems, prose, store activity notifications, greeting messages, wish lists, excerpted texts, compositions, and notes. The dataset exhibits diversity across multiple dimensions: various writing substrates, handwriting styles, writing contents, and capture angles, with the capture angles being eye-level and upward-looking. For annotation, quadrilateral bounding boxes are labeled for row- or column-level text, and the corresponding text is transcribed at the row or column level. This dataset is suitable for Chinese handwritten text OCR tasks.
提供机构:
数据堂
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含222,522张中文手写体图像,覆盖A4纸、方格纸等多种书写环境和诗歌、散文等丰富内容,标注包括行/列级文本的四边形框和转写,适用于中文手写体OCR任务。
以上内容由遇见数据集搜集并总结生成



