HKR (Handwritten Kazakh and Russian (HKR) Database for Text Recognition)
收藏OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/HKR
下载链接
链接失效反馈官方服务:
资源简介:
该数据库是用西里尔文编写的,共享相同的 33 个字符。除了这些字符,哈萨克字母还包含 9 个额外的特定字符。该数据集是表单的集合。数据集中所有表格的来源均由 LATEX 生成,随后由人手写填写。该数据库包含 1400 多份填写好的表格。大约有 63000 个句子,超过 715699 个符号,由大约 200 位不同的作家创作。我们使用了三个不同的数据集,描述如下: 哈萨克语和俄语(地区、城市、村庄等)关键字的手写样本(形式) 西里尔文的手写哈萨克语和俄语字母 俄语诗歌的手写样本(形式)
All text within this database employs Cyrillic script, with 33 shared characters across all related writing systems. The Kazakh Cyrillic alphabet additionally incorporates 9 unique specific characters beyond these 33.
This dataset comprises a collection of forms. All source templates of the tables in the dataset were generated using LaTeX, then manually filled out by hand. This database holds over 1,400 completed forms, approximately 63,000 sentences, more than 715,699 symbols, and was developed by around 200 distinct contributors.
We utilized three distinct datasets, which are detailed as follows:
1. Handwritten samples (forms) of Kazakh and Russian keywords (e.g., terms for regions, cities, villages, and similar categories);
2. Handwritten Kazakh and Russian Cyrillic alphabet characters;
3. Handwritten samples (forms) of Russian poetry.
提供机构:
OpenDataLab
创建时间:
2022-08-19
搜集汇总
数据集介绍

背景与挑战
背景概述
HKR数据集是一个用于文本识别的哈萨克语和俄语手写数据库,基于西里尔文字符,包含33个共享字符和9个哈萨克特有字符。该数据集由1400多份手写填写表格组成,涵盖63000个句子和715699个符号,由约200位作者创作,包括关键词、字母和诗歌样本。
以上内容由遇见数据集搜集并总结生成



