Iltanix/rukopys
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Iltanix/rukopys
下载链接
链接失效反馈官方服务:
资源简介:
RUKOPYS(乌克兰语:рукопис,意为手稿)是第一个用于乌克兰手写文本识别(HTR)的大规模开放数据集。该数据集涵盖了超过一个世纪的乌克兰手写文本,从1920年代的档案文件到当今的学校作业,旨在用于端到端的文档理解:区域检测、类型分类和文本转录。乌克兰语是最大的斯拉夫语言之一(超过4500万母语使用者),但在RUKOPYS之前没有专门的开放HTR数据集。数据集结合了四个不同的来源,涵盖了时间、作者、文档类型、捕获方法、正字法和内容等多个维度,以确保训练的模型能够泛化到现实世界中的各种变化。数据集包括训练集、银标集和测试集,分别用于不同的用途,如模型训练、自训练和竞赛评估。数据集的结构和加载方法也在README中详细说明。
RUKOPYS (Ukrainian: рукопис — manuscript) is the first large-scale open dataset for Ukrainian handwritten text recognition (HTR). It spans over a century of Ukrainian handwriting — from 1920s archival documents to present-day school homework — and is designed for end-to-end document understanding: region detection, type classification, and text transcription. Ukrainian is among the largest Slavic languages (45M+ native speakers) yet had no dedicated open HTR dataset prior to RUKOPYS. The dataset combines four sources that differ across every dimension that makes handwriting recognition hard, ensuring that models trained on it generalize across real-world variation. It includes splits for training (human-annotated), silver (auto-annotated), and test (competition evaluation), with detailed schema and loading instructions provided in the README.
提供机构:
Iltanix



