KOHTD (Kazakh Offline Handwritten Text Dataset)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/KOHTD
下载链接
链接失效反馈官方服务:
资源简介:
我们的数据库包含大量由 Satbayev 大学和 Al-Farabi Kazakh National University 的学生填写的试卷,该考试以哈萨克语 (99%) 和俄语 (1%) 进行并回答,如下所示图,收到这个考试答案后,我们扫描它并进行与考试列表预处理相关的实验,以自动识别列表,评估列表轮廓,恢复旋转,以及逐行和逐词分割,这样我们就可以应用我们的深度学习模型来识别每个单词并移除分段单词边界边缘的伪影。我们使用最先进的深度学习模型开发了我们的智能软件,以解决自然语言的识别和处理问题,其中包括哈萨克语和俄语手稿文本的光学字符识别。
Our database contains a large number of test papers completed by students from Satbayev University and Al-Farabi Kazakh National University. The exams were answered in Kazakh (99%) and Russian (1%), as shown in the following figure. After receiving these exam answer sheets, we scanned them and conducted experiments related to preprocessing the exam lists, including automatically identifying the lists, evaluating their contours, correcting rotation, and performing line-by-line and word-by-word segmentation. This enables us to apply our deep learning models to recognize each individual word and remove artifacts at the edges of segmented word boundaries. We developed our intelligent software using state-of-the-art deep learning models to tackle natural language recognition and processing tasks, including optical character recognition (OCR) for handwritten text in both Kazakh and Russian.
提供机构:
OpenDataLab
创建时间:
2022-08-16
搜集汇总
数据集介绍

背景与挑战
背景概述
KOHTD是一个哈萨克离线手写文本数据集,包含大量由哈萨克斯坦大学学生填写的试卷扫描图像,主要用于哈萨克语和俄语手写文本的光学字符识别研究。该数据集经过预处理(如分割和去伪影),并基于深度学习模型开发,于2022年由相关大学机构发布。
以上内容由遇见数据集搜集并总结生成



