five

Virajbhanage/rukopys

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Virajbhanage/rukopys
下载链接
链接失效反馈
官方服务:
资源简介:
RUKOPYS是第一个大规模开放的乌克兰手写文本识别(HTR)数据集,涵盖了从1920年代档案文件到现代学校作业的乌克兰手写样本。数据集设计用于端到端文档理解,包括区域检测、类型分类和文本转录。乌克兰是最大的斯拉夫语言之一,但在RUKOPYS之前没有专门的开放HTR数据集。数据集包含多个来源,如国家听写、国家档案、大学考试和学校作业,每个来源都有不同的时间、作者、文档类型和捕获方法。数据集分为train、silver和test三个分割,分别用于训练、自训练和测试。此外,数据集还支持手写与印刷文本的区分,并提供了详细的注释模式和反泄漏设计。

RUKOPYS is the first large-scale open dataset for Ukrainian handwritten text recognition (HTR). It spans over a century of Ukrainian handwriting — from 1920s archival documents to present-day school homework — and is designed for end-to-end document understanding: region detection, type classification, and text transcription. Ukrainian is among the largest Slavic languages (45M+ native speakers) yet had no dedicated open HTR dataset prior to RUKOPYS. The dataset combines four sources that differ across every dimension that makes handwriting recognition hard, including time period, writers, document type, capture method, orthography, and content. It is divided into train, silver, and test splits, each serving different purposes. The dataset also supports distinguishing between handwritten and printed text and provides detailed annotation schemas and anti-leakage designs.
提供机构:
Virajbhanage
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作