SbNet Signboard Detection and Classification Dataset

Name: SbNet Signboard Detection and Classification Dataset
Creator: Harvard Dataverse
Published: 2025-04-01 17:19:06
License: 暂无描述

DataCite Commons2025-04-01 更新2025-04-15 收录

下载链接：

https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/UOT0RE

下载链接

链接失效反馈

官方服务：

资源简介：

<h2>Phase 1: Signboard Detection Dataset</h2> This phase focuses on detecting signboards in street images. - **Total Images:** 8,366 - **Image Format:** JPG (8,366 images) - **Resolution:** - Minimum: (720, 443) - Maximum: (9,280, 8,285) - Mean: (4,202, 3,138) - Median: (4,032, 3,024) - **Aspect Ratio:** - Minimum: 0.5625 - Maximum: 5.7043 - Mean: 1.3691 - Most Frequent: 1.3333 - Standard Deviation: 0.2329 - **File Size (KB):** - Minimum: 88.19 KB - Maximum: 41,266.50 KB - Mean: 5,796.19 KB - Total Dataset Size: 48,490,924.91 KB - **Color Statistics:** - Color Mode: RGB (8,366 images) - Mean Color (RGB): (110.32, 112.77, 118.16) - Standard Deviation (RGB): (65.71, 65.36, 65.82) - **Brightness:** - Average: 114.10 --- <h2>Phase 2: Region of Text Interest (RTI) Detection Dataset</h2> This phase focuses on detecting specific text regions (names and addresses) within signboards. - **Total Images:** 8,036 - **Image Format:** JPG (8,036 images) - **Resolution:** - Minimum: (552, 156) - Maximum: (9,228, 4,682) - Mean: (2,753, 808) - Median: (2,741, 781) - **Aspect Ratio:** - Minimum: 0.9615 - Maximum: 11.3835 - Mean: 3.6058 - Most Frequent: 4.0 - Standard Deviation: 1.2475 - **File Size (KB):** - Minimum: 40.54 KB - Maximum: 7,968.94 KB - Mean: 653.67 KB - Total Dataset Size: 5,252,868.26 KB - **Color Statistics:** - Color Mode: RGB (8,036 images) - Mean Color (RGB): (137.58, 136.29, 144.00) - Standard Deviation (RGB): (47.26, 49.73, 50.89) - **Brightness:** - Average: 138.74 --- <h2>Named Entity Recognition (NER) Dataset</h2> This dataset is used for categorizing extracted text from signboards. - **Total Entries:** 42,547 - **Unique Categories:** 10 - **Category Distribution:** - Religious Sites: 10,641 - Retail Outlets: 8,275 - Educational Institutions: 6,826 - Healthcare Institutions: 4,708 - Restaurants: 3,868 - Pharmacies: 3,637 - Parks: 1,547 - Banks: 1,121 - Stations: 1,094 - Hotels: 830 #### **Word Count Statistics:** - **Overall Word Count:** - Maximum: 18 - Minimum: 1 - Mean: 3.82 - **Category-Wise Word Count:* * - **Banks:** Mean: 4.65, Max: 11, Min: 1 - **Educational Institutions:** Mean: 4.60, Max: 18, Min: 1 - **Healthcare Institutions:** Mean: 4.02, Max: 16, Min: 1 - **Religious Sites:** Mean: 4.36, Max: 17, Min: 1 - **Retail Outlets:** Mean: 3.08, Max: 15, Min: 1 - **Restaurants:** Mean: 3.36, Max: 13, Min: 1 - **Pharmacies:** Mean: 2.91, Max: 13, Min: 1 - **Parks:** Mean: 3.10, Max: 11, Min: 1 - **Stations:** Mean: 3.72, Max: 17, Min: 1 - **Hotels:** Mean: 3.12, Max: 12, Min: 1 This dataset is structured for a two-phase object detection pipeline with an additional text classification task to categorize extracted text from detected regions.

提供机构：

Harvard Dataverse

创建时间：

2025-03-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集