yunusserhat/MSRA-TD500-Dataset

Name: yunusserhat/MSRA-TD500-Dataset
Creator: yunusserhat
Published: 2024-04-30 12:07:20
License: 暂无描述

Hugging Face2024-04-30 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/yunusserhat/MSRA-TD500-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- tags: - text-recognition - dataset - text-detection - scene-text - scene-text-recognition - scene-text-detection - text-detection-recognition - icdar - total-text - curve-text task_categories: - text-retrieval language: - en - zh size_categories: - n<1K --- # MSRA Text Detection 500 Database (MSRA-TD500) The MSRA Text Detection 500 Database (MSRA-TD500) is a publicly released benchmark designed to evaluate text detection algorithms. This dataset aims to track recent progresses in the field of text detection within natural images, particularly focusing on texts of arbitrary orientations. ## Dataset Overview MSRA-TD500 contains 500 natural images sourced from indoor (e.g., office and mall) and outdoor (e.g., street) scenes captured with a pocket camera. The images depict various elements such as: - **Indoor**: Signs, doorplates, and caution plates. - **Outdoor**: Guide boards and billboards, often set against complex backgrounds. Images resolutions range from 1296x864 to 1920x1280. This dataset challenges users with the diversity of texts and complexity of backgrounds, featuring texts in different languages (Chinese, English, or both), fonts, sizes, colors, and orientations. Backgrounds may include elements like vegetation and repeated patterns that can be difficult to distinguish from text. ## Example Images ![Typical images from MSRA-TD500](example_image_path.png) *Figure 1: Typical images from MSRA-TD500 showing texts labeled as difficult due to factors like blur or occlusion.* ## Dataset Structure The dataset is split into two sets: - **Training Set**: 300 images randomly selected from the original dataset. - **Test Set**: 200 images. All images are fully annotated, with the primary unit of annotation being the text line. This differs from the ICDAR datasets, which use the word as the basic unit. ## Ground Truth Annotation Ground truth generation involves locating and bounding each text line using a four-vertex polygon, followed by fitting a minimum area rectangle around the polygon. ![Ground truth generation](ground_truth_image_path.png) *Figure 2: Ground truth generation process.* ## Evaluation Protocol The evaluation protocol, designed to accommodate texts of arbitrary orientations, uses minimum area rectangles for tighter fitting. Texts labeled as "difficult" include additional challenges like small size, occlusion, blur, or truncation. Detection misses of such texts are not penalized. ## Ground Truth File Format Each image has a corresponding ground truth file. Each line in the file provides details about one text line, marking "difficult" texts with a label. ``` # Ground Truth Format Example Index; Text Coords; Difficulty 0; x1,y1,x2,y2,x3,y3,x4,y4; 0 ``` ![](illustration.png) *Figure 3: Illustration of the ground truth file format.* ## Reference C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. "Detecting Texts of Arbitrary Orientations in Natural Images." CVPR 2012.

提供机构：

yunusserhat

原始信息汇总

MSRA Text Detection 500 Database (MSRA-TD500)

数据集概述

MSRA-TD500是一个公开发布的基准数据集，旨在评估文本检测算法。该数据集旨在追踪自然图像中文本检测领域的最新进展，特别关注任意方向的文本。

数据来源

MSRA-TD500包含500张自然图像，来源于室内（如办公室和商场）和室外（如街道）场景，使用便携式相机拍摄。图像展示了各种元素，如：

室内：标志、门牌和警示牌。
室外：指示牌和广告牌，通常背景复杂。

图像分辨率范围从1296x864到1920x1280。该数据集挑战用户面对文本多样性和背景复杂性，包括不同语言（中文、英文或两者）、字体、大小、颜色和方向的文本。背景可能包含难以与文本区分的元素，如植被和重复图案。

数据集结构

数据集分为两个部分：

训练集：从原始数据集中随机选择的300张图像。
测试集：200张图像。

所有图像都经过完全标注，主要标注单位是文本行。这与使用单词作为基本单位的ICDAR数据集不同。

标注生成

标注生成涉及使用四顶点多边形定位和包围每个文本行，然后围绕多边形拟合最小面积矩形。

评估协议

评估协议设计用于适应任意方向的文本，使用最小面积矩形进行更紧密的拟合。标记为“困难”的文本包括额外的挑战，如小尺寸、遮挡、模糊或截断。检测遗漏此类文本不会受到惩罚。

标注文件格式

每张图像都有一个对应的标注文件。文件中的每一行提供了一个文本行的详细信息，标记为“困难”的文本带有标签。

标注文件格式示例

索引; 文本坐标; 难度 0; x1,y1,x2,y2,x3,y3,x4,y4; 0

5,000+

优质数据集

54 个

任务类型

进入经典数据集