lumimusta/Low-light_Scene_Text_Dataset
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/lumimusta/Low-light_Scene_Text_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- image-to-text
language:
- en
- es
pretty_name: Low-light Scene Text Dataset
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: test
path: test.jsonl
- split: test_real
path: test_real.jsonl
---
# Low-light Scene Text Dataset
This repository provides a low-light scene text recognition dataset for studying text recognition under challenging illumination conditions. The dataset is designed to support research on Low-light Scene Text Recognition (LLSTR), where text images may suffer from low contrast, noise, uneven illumination, blur, and other degradations commonly observed in nighttime or poorly lit environments.
The dataset contains two main parts:
- **LSTR**: a large-scale low-light scene text recognition dataset derived from well-lit scene text datasets, including ICDAR2015, IIIT5K, and WordArt.
- **ESTR**: a real-world nighttime scene text evaluation set containing real low-light street-scene images with English and Spanish text.
The goal of this dataset is to provide a benchmark for evaluating OCR and low-light image enhancement methods in dark environments, especially for approaches that aim to preserve text readability rather than only improve visual brightness.
## Dataset Files
This repository contains the following files:
| File | Description |
|---|---|
| `low_light_train.zip` | Low-light training images |
| `well_lit_train.zip` | Corresponding well-lit training images |
| `low_light_test.zip` | Low-light test images |
| `well_lit_test.zip` | Corresponding well-lit test images |
| `low_light_test_real.zip` | Real-world low-light test images |
| `train.jsonl` | Training annotations |
| `test.jsonl` | Test annotations |
| `test_real.jsonl` | Real-world test annotations |
| `train_label.txt` | Training labels |
| `test_label.txt` | Test labels |
| `low_light_test_real.txt` | Labels for the real-world low-light test set |
## Splits
The dataset is organized into the following splits:
| Split | Annotation File | Image Archive |
|---|---|---|
| `train` | `train.jsonl` | `low_light_train.zip`, `well_lit_train.zip` |
| `test` | `test.jsonl` | `low_light_test.zip`, `well_lit_test.zip` |
| `test_real` | `test_real.jsonl` | `low_light_test_real.zip` |
## Dataset Description
Low-light scene text recognition is challenging because images captured in dark environments often contain weak contrast, noise, color distortion, and uneven illumination. These degradations can significantly reduce the performance of standard OCR models trained mainly on well-lit images.
This dataset provides low-light and well-lit scene text images for training and evaluation. The low-light portion is based on scene text images from ICDAR2015, IIIT5K, and WordArt, while the real-world evaluation subset contains nighttime street-scene images collected under natural low-light conditions.
The real-world subset includes diverse text appearances, backgrounds, fonts, and languages, with text instances in English and Spanish.
## Dataset Statistics
| Dataset Part | Description |
|---|---|
| LSTR | Low-light scene text recognition data derived from well-lit scene text datasets |
| ESTR | Real-world nighttime scene text images for evaluation |
The dataset is intended as a compact benchmark for low-light text recognition research. It is especially useful for evaluating whether OCR systems and low-light enhancement methods can preserve fine text structures such as character strokes, edges, and boundaries.
## Usage
After downloading the dataset, extract the image archives before using the annotation files.
Example:
```bash
unzip low_light_train.zip -d low_light_train
unzip well_lit_train.zip -d well_lit_train
unzip low_light_test.zip -d low_light_test
unzip well_lit_test.zip -d well_lit_test
unzip low_light_test_real.zip -d low_light_test_real
```
The annotation files can then be loaded from the corresponding `.jsonl` files.
Example:
```python
import json
with open("train.jsonl", "r", encoding="utf-8") as f:
samples = [json.loads(line) for line in f]
print(samples[0])
```
## Intended Use
This dataset is intended for research on:
- Low-light scene text recognition
- Robust OCR under nighttime or poorly illuminated conditions
- Low-light image enhancement for text readability
- Joint optimization of image enhancement and OCR models
- Benchmarking OCR models under illumination degradation
## Citation
If you use this dataset, please cite the corresponding paper:
```bibtex
@inproceedings{fu2026reading,
title={Reading in the Dark: Low-light Scene Text Recognition},
author={Fu, Xuanshuo and Kang, Lei and Valveny, Ernest and Karatzas, Dimosthenis and Vazquez-Corral, Javier},
booktitle={ICPR},
year={2026}
}
```
## Notes
The image files are stored as ZIP archives. The Hugging Face Dataset Viewer may only display the `.jsonl` annotation files and may not preview all images inside the ZIP archives directly. Please download and extract the ZIP files for full dataset usage.
提供机构:
lumimusta



