MediaText: a media industry-based dataset for scene text detetcion
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12796379
下载链接
链接失效反馈官方服务:
资源简介:
Media-Text
Media-Text dataset comprising images of banners, posters, covers and another images characterised for media industry.
Full paper is available here: Media-Text: a Media Industry-Based Dataset for Scene Text Detection
DATASET DESCRIPTION
400 images
7 744 annotated text instances
973 annotations have been marked as illegible for the task of text recognition
659 texts have been markes as do not care (###) for scene text detection.
Images are represented by 193 unique resolutions.
Annotation Format - Each image has corresponding gt_*.txt file, which contains annotations in bounding box format (defined by 4 courners), transcription, and bool flag which determines that text is illegible for OCR. Proposed format is similar to ICDAR15 annotations.
x1, x2, ..., x4, y4, transcription, OCR Flag
Example:37,68,198,49,214,181,52,200,LADIES,False
ACKNOWLEDGMENT
This work was supported by the Silesian University of Technology (SUT) through the subsidy for maintaining and developing research potential grant in 2024 for young researchers, No. 2/070/BKM24/0058, and by the Ministry of Science and Higher Education "Implementation Doctorate" No. DWD/5/0511/2021.
Thanks to the graphic department of media-press group for the preparation and possibility of sharing graphics thematically related to the prepared dataset.
LICENSE
Annotations created by authors are licesned under CC-BY-4.0 license.Images from the Open-Image-V7 dataset and are licensed according to their source information. Source information is defined in a file metadata.csv file that defines all the metadata of each file (File name corresponds to the ImageID column).
Images whose name corresponds to the media_press pattern are provided for academic use.
CITING THE RELATED WORKS
Please cite the related works in your publications if it helps your research:
```
@inproceedings{inproceedings,
author = {Kalisz, Seweryn and Marczyk, Michał and Polanska, Joanna},
booktitle = {Modelling and simulation 2024. The 2024 European Simulation and Modelling Conference}
editor = {Manuel Graña; J. David Nuñez-Gonzalez}
year = {2024},
month = {10},
pages = {138-144},
publisher = {EUROSIS-ETI},
title = {Media-Text: a Media Industry-Based Dataset for Scene Text Detection}
}
```
创建时间:
2024-10-29



