zirak-ai/PashtoOCR
收藏Hugging Face2025-09-11 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/zirak-ai/PashtoOCR
下载链接
链接失效反馈官方服务:
资源简介:
PsOCR是一个面向低资源普什图语的大型合成光学字符识别(OCR)数据集,包含100万张合成图像,具有单词、行和文档级别的注释,覆盖了包括1000种独特字体家族、多种颜色、图像大小和文本布局的广泛变化。该数据集还包括了用于低资源普什图语OCR系统评估和比较的10000张图像组成的基准。
PsOCR is a large-scale synthetic dataset for Optical Character Recognition in low-resource Pashto language, containing One Million synthetic images annotated at word, line, and document-level granularity, covering extensive variations including 1000 unique font families, diverse colors, image sizes, and text layouts. The dataset also includes a benchmark of 10,000 images for systematic evaluation and comparison of OCR systems for low-resource Pashto.
提供机构:
zirak-ai



