Urdu Text Scene Images

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14885940

下载链接

链接失效反馈

官方服务：

资源简介：

Description: Urdu Text Scene Images extraction in natural scenes poses unique challenges, largely due to the lack of publicly available datasets. To address this, we offer an enriched dataset containing 500 high-quality images of Urdu text captured in real-world environments. These images represent diverse settings, lighting conditions, and backgrounds, making it ideal for researchers and developers working on Urdu Optical Character Recognition (OCR) systems. Dataset Structure The dataset is structured as follows: Training Set (Training Raw): Contains raw images featuring Urdu text for model training. Test Set (Test Raw): A separate set of images for model testing and validation. Non-Text Set (Non-Text Raw): Scene images with no Urdu text to prevent false positives and enhance text classification models. Download Dataset Applications This dataset is specifically designed to support the following applications: Urdu Text Detection & Recognition: Building and fine-tuning OCR models for Urdu script in natural scenes. Multilingual OCR Systems: Extending existing text recognition systems to include Urdu, especially for South Asian languages with similar script structures. Autonomous Driving & Navigation Systems: Recognizing Urdu text in street signs, direction boards, and public places, improving functionality in Urdu-speaking regions. Augmented Reality (AR) Applications: Real-time Urdu text translation or interpretation in natural scenes for tourists or native speakers. Potential Use Cases Multilingual Document Digitization: This dataset can be integrated into systems designed for multilingual digitization, where recognizing Urdu text in complex backgrounds is critical. Urban Planning & Smart Cities: The dataset can aid in the development of systems that recognize text in public areas for smart city initiatives and urban planning efforts. Mobile Applications: Can be used to enhance mobile apps that need to extract and recognize Urdu text for translation or user interaction. Future Enhancements Further dataset releases could include a broader array of text instances, incorporating more variations in fonts, languages (including mixed-language scenarios), and additional annotations like bounding boxes for character-level recognition. This would extend its utility in fields like document analysis, smart OCR solutions, and advanced multilingual systems. This dataset is sourced from Kaggle.

创建时间：

2025-02-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集