Timka28/cyrillic
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Timka28/cyrillic
下载链接
链接失效反馈官方服务:
资源简介:
Cyrillic Handwriting Mixed Dataset是一个大型的手写西里尔文字数据集,适用于手写识别(HWR)、光学字符识别(OCR)、多模态模型和手写研究。数据集包含155,963个样本,其中130,025个用于训练,25,938个用于测试。每个样本包括图像、转录文本、数据集来源、最终划分(训练/测试)以及原始数据集中的划分(如果存在)。数据集由多个来源组合而成,提供了文本长度的详细统计信息和一个示例记录。还提供了从HuggingFace下载数据集的说明。
The Cyrillic Handwriting Mixed Dataset is a large combined dataset of handwritten Cyrillic text, suitable for HWR/OCR, multimodal models, and handwriting research. It contains 155,963 samples, with 130,025 for training and 25,938 for testing. Each sample includes an image, transcription text, dataset source, final split (train/test), and original split from the source dataset (if available). The dataset is compiled from multiple sources, with detailed statistics on text length and an example record. Instructions for downloading the dataset from HuggingFace are also provided.
提供机构:
Timka28



