FIPU-OCR-CHAR: Font-Invariant Printed Urdu Character Dataset
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/9cdk8y89v6/1
下载链接
链接失效反馈官方服务:
资源简介:
The FIPU-OCR-CHAR dataset is a large-scale, font-invariant corpus of printed Urdu characters designed to support research in optical character recognition, font generalization, and script analysis. The dataset contains 337,680 labeled images across 48 Urdu classes, including 38 alphabets and 10 numerals. Each character was rendered in 201 diverse Urdu font styles and further transformed using 34 augmentation operations to simulate real-world printing, scanning, and distortion conditions. Images were rendered 28×28 PNG files with 24-bit depth. The data suggests that high font diversity and augmentation variety significantly improve the robustness and generalization capability of OCR models, as confirmed through preliminary experiments using ResNet-34. The dataset was generated programmatically from font-rendered characters and processed through controlled augmentation pipelines, producing consistent and balanced samples suitable for training, validation, and benchmarking. It can be interpreted as a foundational resource for building and evaluating deep learning models for Urdu OCR and serves as a baseline for future character-, word-, and line-level datasets.
提供机构:
NED University of Engineering and Technology



