lumimusta/Low-light_Scene_Text_Dataset

Name: lumimusta/Low-light_Scene_Text_Dataset
Creator: lumimusta
Published: 2026-04-16 13:17:53
License: 暂无描述

Hugging Face2026-04-16 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/lumimusta/Low-light_Scene_Text_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - image-to-text language: - en - es pretty_name: Low-light Scene Text Dataset configs: - config_name: default data_files: - split: train path: train.jsonl - split: test path: test.jsonl - split: test_real path: test_real.jsonl --- # Low-light Scene Text Dataset This repository provides a low-light scene text recognition dataset for studying text recognition under challenging illumination conditions. The dataset is designed to support research on Low-light Scene Text Recognition (LLSTR), where text images may suffer from low contrast, noise, uneven illumination, blur, and other degradations commonly observed in nighttime or poorly lit environments. The dataset contains two main parts: - **LSTR**: a large-scale low-light scene text recognition dataset derived from well-lit scene text datasets, including ICDAR2015, IIIT5K, and WordArt. - **ESTR**: a real-world nighttime scene text evaluation set containing real low-light street-scene images with English and Spanish text. The goal of this dataset is to provide a benchmark for evaluating OCR and low-light image enhancement methods in dark environments, especially for approaches that aim to preserve text readability rather than only improve visual brightness. ## Dataset Files This repository contains the following files: | File | Description | |---|---| | `low_light_train.zip` | Low-light training images | | `well_lit_train.zip` | Corresponding well-lit training images | | `low_light_test.zip` | Low-light test images | | `well_lit_test.zip` | Corresponding well-lit test images | | `low_light_test_real.zip` | Real-world low-light test images | | `train.jsonl` | Training annotations | | `test.jsonl` | Test annotations | | `test_real.jsonl` | Real-world test annotations | | `train_label.txt` | Training labels | | `test_label.txt` | Test labels | | `low_light_test_real.txt` | Labels for the real-world low-light test set | ## Splits The dataset is organized into the following splits: | Split | Annotation File | Image Archive | |---|---|---| | `train` | `train.jsonl` | `low_light_train.zip`, `well_lit_train.zip` | | `test` | `test.jsonl` | `low_light_test.zip`, `well_lit_test.zip` | | `test_real` | `test_real.jsonl` | `low_light_test_real.zip` | ## Dataset Description Low-light scene text recognition is challenging because images captured in dark environments often contain weak contrast, noise, color distortion, and uneven illumination. These degradations can significantly reduce the performance of standard OCR models trained mainly on well-lit images. This dataset provides low-light and well-lit scene text images for training and evaluation. The low-light portion is based on scene text images from ICDAR2015, IIIT5K, and WordArt, while the real-world evaluation subset contains nighttime street-scene images collected under natural low-light conditions. The real-world subset includes diverse text appearances, backgrounds, fonts, and languages, with text instances in English and Spanish. ## Dataset Statistics | Dataset Part | Description | |---|---| | LSTR | Low-light scene text recognition data derived from well-lit scene text datasets | | ESTR | Real-world nighttime scene text images for evaluation | The dataset is intended as a compact benchmark for low-light text recognition research. It is especially useful for evaluating whether OCR systems and low-light enhancement methods can preserve fine text structures such as character strokes, edges, and boundaries. ## Usage After downloading the dataset, extract the image archives before using the annotation files. Example: ```bash unzip low_light_train.zip -d low_light_train unzip well_lit_train.zip -d well_lit_train unzip low_light_test.zip -d low_light_test unzip well_lit_test.zip -d well_lit_test unzip low_light_test_real.zip -d low_light_test_real ``` The annotation files can then be loaded from the corresponding `.jsonl` files. Example: ```python import json with open("train.jsonl", "r", encoding="utf-8") as f: samples = [json.loads(line) for line in f] print(samples[0]) ``` ## Intended Use This dataset is intended for research on: - Low-light scene text recognition - Robust OCR under nighttime or poorly illuminated conditions - Low-light image enhancement for text readability - Joint optimization of image enhancement and OCR models - Benchmarking OCR models under illumination degradation ## Citation If you use this dataset, please cite the corresponding paper: ```bibtex @inproceedings{fu2026reading, title={Reading in the Dark: Low-light Scene Text Recognition}, author={Fu, Xuanshuo and Kang, Lei and Valveny, Ernest and Karatzas, Dimosthenis and Vazquez-Corral, Javier}, booktitle={ICPR}, year={2026} } ``` ## Notes The image files are stored as ZIP archives. The Hugging Face Dataset Viewer may only display the `.jsonl` annotation files and may not preview all images inside the ZIP archives directly. Please download and extract the ZIP files for full dataset usage.

提供机构：

lumimusta

5,000+

优质数据集

54 个

任务类型

进入经典数据集