five

FIPU-OCR-CHAR: Font-Invariant Printed Urdu Character Dataset

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/9cdk8y89v6/1
下载链接
链接失效反馈
官方服务:
资源简介:
The FIPU-OCR-CHAR dataset is a large-scale, font-invariant corpus of printed Urdu characters designed to support research in optical character recognition, font generalization, and script analysis. The dataset contains 337,680 labeled images across 48 Urdu classes, including 38 alphabets and 10 numerals. Each character was rendered in 201 diverse Urdu font styles and further transformed using 34 augmentation operations to simulate real-world printing, scanning, and distortion conditions. Images were rendered 28×28 PNG files with 24-bit depth. The data suggests that high font diversity and augmentation variety significantly improve the robustness and generalization capability of OCR models, as confirmed through preliminary experiments using ResNet-34. The dataset was generated programmatically from font-rendered characters and processed through controlled augmentation pipelines, producing consistent and balanced samples suitable for training, validation, and benchmarking. It can be interpreted as a foundational resource for building and evaluating deep learning models for Urdu OCR and serves as a baseline for future character-, word-, and line-level datasets.
提供机构:
NED University of Engineering and Technology
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作