five

MediaText: a media industry-based dataset for scene text detetcion

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12796379
下载链接
链接失效反馈
官方服务:
资源简介:
Media-Text Media-Text dataset comprising images of banners, posters, covers and another images characterised for media industry. Full paper is available here: Media-Text: a Media Industry-Based Dataset for Scene Text Detection DATASET DESCRIPTION 400 images 7 744 annotated text instances 973 annotations have been marked as illegible for the task of text recognition 659 texts have been markes as do not care (###) for scene text detection. Images are represented by 193 unique resolutions. Annotation Format - Each image has corresponding  gt_*.txt file, which contains annotations in bounding box format (defined by 4 courners), transcription, and bool flag which determines that text is illegible for OCR. Proposed format is similar to ICDAR15 annotations. x1, x2, ..., x4, y4, transcription, OCR Flag  Example:37,68,198,49,214,181,52,200,LADIES,False ACKNOWLEDGMENT This work was supported by the Silesian University of Technology (SUT) through the subsidy for maintaining and developing research potential grant in 2024 for young researchers, No. 2/070/BKM24/0058, and by the Ministry of Science and Higher Education "Implementation Doctorate" No. DWD/5/0511/2021. Thanks to the graphic department of media-press group for the preparation and possibility of sharing graphics thematically related to the prepared dataset.   LICENSE Annotations created by authors are licesned under CC-BY-4.0 license.Images from the Open-Image-V7 dataset and are licensed according to their source information. Source information is defined in a file metadata.csv file that defines all the metadata of each file (File name corresponds to the ImageID column). Images whose name corresponds to the media_press pattern are provided for academic use. CITING THE RELATED WORKS   Please cite the related works in your publications if it helps your research: ``` @inproceedings{inproceedings, author = {Kalisz, Seweryn and Marczyk, Michał and Polanska, Joanna}, booktitle = {Modelling and simulation 2024. The 2024 European Simulation and Modelling Conference} editor = {Manuel Graña; J. David Nuñez-Gonzalez} year = {2024}, month = {10}, pages = {138-144}, publisher = {EUROSIS-ETI}, title = {Media-Text: a Media Industry-Based Dataset for Scene Text Detection} } ```
创建时间:
2024-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作