bazyl/GTSRB

Name: bazyl/GTSRB
Creator: bazyl
Published: 2022-10-25 10:39:19
License: 暂无描述

Hugging Face2022-10-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/bazyl/GTSRB

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language_creators: - found language: [] license: - gpl-3.0 multilinguality: [] size_categories: - 10K<n<100K source_datasets: - original task_categories: - image-classification task_ids: - multi-label-image-classification pretty_name: GTSRB --- # Dataset Card for GTSRB ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** http://www.sciencedirect.com/science/article/pii/S0893608012000457 - **Repository:** https://github.com/bazylhorsey/gtsrb/ - **Paper:** Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition - **Leaderboard:** https://benchmark.ini.rub.de/gtsrb_results.html - **Point of Contact:** bhorsey16@gmail.com ### Dataset Summary The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers from relevant fields to participate: The competition is designed to allow for participation without special domain knowledge. Our benchmark has the following properties: - Single-image, multi-class classification problem - More than 40 classes - More than 50,000 images in total - Large, lifelike database ### Supported Tasks and Leaderboards [Kaggle](https://www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign) \ [Original](https://benchmark.ini.rub.de/gtsrb_results.html) ## Dataset Structure ### Data Instances ``` { "Width": 31, "Height": 31, "Roi.X1": 6, "Roi.Y1": 6, "Roi.X2": 26, "Roi.Y2": 26, "ClassId": 20, "Path": "Train/20/00020_00004_00002.png", } ``` ### Data Fields - Width: width of image - Height: Height of image - Roi.X1: Upper left X coordinate - Roi.Y1: Upper left Y coordinate - Roi.X2: Lower right t X coordinate - Roi.Y2: Lower right Y coordinate - ClassId: Class of image - Path: Path of image ### Data Splits Categories: 42 Train: 39209 Test: 12630 ## Dataset Creation ### Curation Rationale Recognition of traffic signs is a challenging real-world problem of high industrial relevance. Although commercial systems have reached the market and several studies on this topic have been published, systematic unbiased comparisons of different approaches are missing and comprehensive benchmark datasets are not freely available. Traffic sign recognition is a multi-class classification problem with unbalanced class frequencies. Traffic signs can provide a wide range of variations between classes in terms of color, shape, and the presence of pictograms or text. However, there exist subsets of classes (e. g., speed limit signs) that are very similar to each other. The classifier has to cope with large variations in visual appearances due to illumination changes, partial occlusions, rotations, weather conditions, etc. Humans are capable of recognizing the large variety of existing road signs with close to 100% correctness. This does not only apply to real-world driving, which provides both context and multiple views of a single traffic sign, but also to the recognition from single images.

annotations_creators: - 众包标注（crowdsourced） language_creators: - 公开采集（found） language: [] license: - GNU通用公共许可证v3.0（GPL-3.0） multilinguality: [] size_categories: - 10K<n<100K source_datasets: - 原始数据集（original） task_categories: - 图像分类（image-classification） task_ids: - 多标签图像分类（multi-label-image-classification） pretty_name: GTSRB --- # GTSRB数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-instances) - [数据划分](#data-instances) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差分析](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集管理者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **主页**：http://www.sciencedirect.com/science/article/pii/S0893608012000457 - **代码仓库**：https://github.com/bazylhorsey/gtsrb/ - **相关论文**：《人机对决：面向交通标志识别的机器学习算法基准测试》（Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition） - **排行榜**：https://benchmark.ini.rub.de/gtsrb_results.html - **联系人**：bhorsey16@gmail.com ### 数据集概述德国交通标志基准数据集（German Traffic Sign Benchmark）是2011年国际神经网络联合会议（International Joint Conference on Neural Networks, IJCNN）推出的多类别单图像分类挑战赛任务。我们诚挚邀请相关领域研究者参与：本竞赛无需特定领域知识即可参赛。本基准数据集具备以下特性： - 单图像多类别分类任务 - 类别数超过40个 - 总图像量超过5万张 - 规模庞大且贴近真实场景的数据库 ### 支持任务与排行榜 - Kaggle平台数据集：https://www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign - 官方原始排行榜：https://benchmark.ini.rub.de/gtsrb_results.html ## 数据集结构 ### 数据实例 { "Width": 31, "Height": 31, "Roi.X1": 6, "Roi.Y1": 6, "Roi.X2": 26, "Roi.Y2": 26, "ClassId": 20, "Path": "Train/20/00020_00004_00002.png", } ### 数据字段 - Width：图像宽度 - Height：图像高度 - Roi.X1：感兴趣区域左上角X坐标 - Roi.Y1：感兴趣区域左上角Y坐标 - Roi.X2：感兴趣区域右下角X坐标 - Roi.Y2：感兴趣区域右下角Y坐标 - ClassId：图像所属类别ID - Path：图像文件路径 ### 数据划分类别总数：42 训练集样本量：39209 测试集样本量：12630 ## 数据集构建 ### 构建初衷交通标志识别是一项兼具高工业应用价值的挑战性真实世界问题。尽管商用系统已投入市场，且该主题的多项研究已发表，但目前仍缺乏对不同方法的系统性无偏比较，且全面的基准数据集尚未免费开放。交通标志识别属于类别频率不均衡的多类别分类任务。不同类别的交通标志在颜色、形状以及图标或文字的存在形式上存在较大差异，但部分类别子集（例如限速标志）彼此之间相似度极高。分类模型需要应对因光照变化、部分遮挡、旋转、天气条件等因素带来的视觉外观巨大变化。人类能够以接近100%的准确率识别现有的各类道路标志，这不仅适用于能提供单一交通标志的上下文与多视角的真实驾驶场景，也适用于单图像识别任务。

提供机构：

bazyl

原始信息汇总

数据集卡片 for GTSRB

数据集描述

数据集摘要

德国交通标志基准（GTSRB）是一个多类别、单图像分类挑战，于2011年国际联合神经网络会议（IJCNN）上举行。该基准具有以下特性：

单图像、多类别分类问题
超过40个类别
总共超过50,000张图像
大型、逼真的数据库

支持的任务和排行榜

数据集结构

数据实例

json { "Width": 31, "Height": 31, "Roi.X1": 6, "Roi.Y1": 6, "Roi.X2": 26, "Roi.Y2": 26, "ClassId": 20, "Path": "Train/20/00020_00004_00002.png" }

数据字段

Width: 图像宽度
Height: 图像高度
Roi.X1: 左上角X坐标
Roi.Y1: 左上角Y坐标
Roi.X2: 右下角X坐标
Roi.Y2: 右下角Y坐标
ClassId: 图像类别
Path: 图像路径

数据分割

类别: 42
训练集: 39209
测试集: 12630

数据集创建

策划理由

交通标志识别是一个具有高工业相关性的现实世界挑战问题。尽管市场上已有商用系统，并且已有许多关于此主题的研究发表，但不同方法的系统性无偏比较仍然缺失，且全面基准数据集尚未自由可用。

交通标志识别是一个多类别分类问题，具有不平衡的类别频率。交通标志在颜色、形状和是否包含象形图或文字方面提供了广泛的类别间变化。然而，存在一些非常相似的类别子集（例如，速度限制标志）。

分类器必须应对由于光照变化、部分遮挡、旋转、天气条件等引起的视觉外观的大量变化。

人类能够以接近100%的正确率识别大量现有的道路标志。这不仅适用于现实世界的驾驶，提供了单一交通标志的上下文和多重视图，也适用于从单个图像的识别。

搜集汇总

数据集介绍

构建方式

GTSRB数据集的构建，源于对交通标志识别这一现实问题的挑战，旨在通过设立一个多类别的单一图像分类任务，为研究者提供一个公正的比较平台。该数据集的构建选用了超过50,000张图像，涵盖40多个类别，通过人工标注的方式完成图像的类别划分，确保了数据的多样性和挑战性。

特点

GTSRB数据集具备鲜明的特点：它是一个多类别、单一图像的分类问题，图像种类丰富，包含多种交通标志；数据量大，具备实际应用场景中的复杂多变特性，如光照变化、部分遮挡、旋转和不同天气条件等。此外，数据集的类别分布不均，为算法的训练和评估带来了额外的挑战。

使用方法

使用GTSRB数据集，用户可以将其分为训练集和测试集，其中训练集包含39,209张图像，测试集包含12,630张图像。用户可以根据数据集中的字段信息，如图像的宽高、坐标以及类别ID和路径，进行图像的加载和预处理。该数据集适用于图像分类模型的训练和评估，特别是针对交通标志识别的应用研究。

背景与挑战

背景概述

GTSRB数据集，全称为German Traffic Sign Benchmark，是在2011年国际联合神经网络会议（IJCNN）上举办的多类单图像分类挑战的基准数据集。此数据集由德国亚琛工业大学的研究团队创建，旨在提供一个大规模、现实场景的交通标志识别基准，以促进机器学习算法在交通标志识别领域的比较研究。数据集涵盖了50,000多张图片，分为42个类别，包含了交通标志的各种变化，如颜色、形状、图标或文字的存在等。GTSRB数据集对相关领域产生了显著影响，成为了评估和比较不同机器学习算法在交通标志识别任务上的性能的标准数据集。

当前挑战

GTSRB数据集在构建和应用过程中面临多项挑战。首先，交通标志的识别是一个多类分类问题，且类别分布不均衡，这要求算法能够处理这种不平衡性。其次，交通标志在视觉表现上存在很大的变化，如光照变化、部分遮挡、旋转、天气条件等，这对算法的泛化能力提出了挑战。此外，数据集的构建过程中，如何确保注释的质量和一致性，以及如何处理个人敏感信息，也是需要克服的重要问题。

常用场景

经典使用场景

在智能交通系统的研究领域，GTSRB数据集以其丰富的交通标志种类和数量，成为单张图像多类分类任务的经典应用场景。该数据集不仅包含了超过50,000张图像，覆盖了40余种交通标志类别，而且提供了详尽的图像属性信息，为机器学习模型的训练和评估提供了坚实基础。

衍生相关工作

基于GTSRB数据集，学术界衍生出了众多经典工作，包括但不限于交通标志识别算法的研究、深度学习模型的优化以及数据增强技术的应用等，这些研究进一步拓展了智能交通领域的边界，并推动了相关技术的商业化和产业化进程。

数据集最近研究