RAVI: Synthetic Urdu Text Image Dataset for OCR

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/mhy5vxnths

下载链接

链接失效反馈

官方服务：

资源简介：

The RAVI dataset is a synthetic image dataset designed to support the development and training of Urdu OCR (Optical Character Recognition) models. It consists of 99,000 high-resolution images (256x256 pixels), each containing a single Urdu word rendered in black text on a white background. The images are labeled with their corresponding Urdu words, enabling both supervised training and evaluation of word-level OCR systems. The text in the images is rendered using the “Jameel Noori Nastaleeq” font, a popular and widely used Nastaliq-style Urdu font, at font size 40. The dataset is organized into subfolders corresponding to the Urdu alphabet, allowing for easier categorization, retrieval, and model evaluation based on character-specific performance. This dataset is particularly valuable for researchers and developers working on CNN-based OCR systems, including both printed and future handwritten text recognition in Urdu. It can serve as a benchmark for word-level OCR models, sequence prediction architectures, and other deep learning applications in low-resource languages. Key Features: 99,000 annotated images Image resolution: 256x256 pixels Black Urdu text on white background Font: Jameel Noori Nastaleeq, size 40 Organized alphabetically by Urdu letters Suitable for training, validation, and benchmarking of OCR systems

创建时间：

2025-06-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集