five

RAVI: Synthetic Urdu Text Image Dataset for OCR

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/mhy5vxnths
下载链接
链接失效反馈
官方服务:
资源简介:
The RAVI dataset is a synthetic image dataset designed to support the development and training of Urdu OCR (Optical Character Recognition) models. It consists of 99,000 high-resolution images (256x256 pixels), each containing a single Urdu word rendered in black text on a white background. The images are labeled with their corresponding Urdu words, enabling both supervised training and evaluation of word-level OCR systems. The text in the images is rendered using the “Jameel Noori Nastaleeq” font, a popular and widely used Nastaliq-style Urdu font, at font size 40. The dataset is organized into subfolders corresponding to the Urdu alphabet, allowing for easier categorization, retrieval, and model evaluation based on character-specific performance. This dataset is particularly valuable for researchers and developers working on CNN-based OCR systems, including both printed and future handwritten text recognition in Urdu. It can serve as a benchmark for word-level OCR models, sequence prediction architectures, and other deep learning applications in low-resource languages. Key Features: 99,000 annotated images Image resolution: 256x256 pixels Black Urdu text on white background Font: Jameel Noori Nastaleeq, size 40 Organized alphabetically by Urdu letters Suitable for training, validation, and benchmarking of OCR systems
创建时间:
2025-06-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作