five

SbNet Signboard Detection and Classification Dataset

收藏
DataCite Commons2025-04-01 更新2025-04-15 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/UOT0RE
下载链接
链接失效反馈
官方服务:
资源简介:
<h2>Phase 1: Signboard Detection Dataset</h2> <i>This phase focuses on detecting signboards in street images.</i><br> - **Total Images:** 8,366<br> - **Image Format:** JPG (8,366 images)<br> - **Resolution:**<br> - Minimum: (720, 443)<br> - Maximum: (9,280, 8,285)<br> - Mean: (4,202, 3,138)<br> - Median: (4,032, 3,024)<br> - **Aspect Ratio:**<br> - Minimum: 0.5625<br> - Maximum: 5.7043<br> - Mean: 1.3691<br> - Most Frequent: 1.3333<br> - Standard Deviation: 0.2329<br> - **File Size (KB):**<br> - Minimum: 88.19 KB<br> - Maximum: 41,266.50 KB<br> - Mean: 5,796.19 KB<br> - Total Dataset Size: 48,490,924.91 KB<br> - **Color Statistics:**<br> - Color Mode: RGB (8,366 images)<br> - Mean Color (RGB): (110.32, 112.77, 118.16)<br> - Standard Deviation (RGB): (65.71, 65.36, 65.82)<br> - **Brightness:**<br> - Average: 114.10<br> ---<br> <h2>Phase 2: Region of Text Interest (RTI) Detection Dataset</h2> <i>This phase focuses on detecting specific text regions (names and addresses) within signboards.</i><br> - **Total Images:** 8,036<br> - **Image Format:** JPG (8,036 images)<br> - **Resolution:**<br> - Minimum: (552, 156)<br> - Maximum: (9,228, 4,682)<br> - Mean: (2,753, 808)<br> - Median: (2,741, 781)<br> - **Aspect Ratio:**<br> - Minimum: 0.9615<br> - Maximum: 11.3835<br> - Mean: 3.6058<br> - Most Frequent: 4.0<br> - Standard Deviation: 1.2475<br> - **File Size (KB):**<br> - Minimum: 40.54 KB<br> - Maximum: 7,968.94 KB<br> - Mean: 653.67 KB<br> - Total Dataset Size: 5,252,868.26 KB<br> - **Color Statistics:**<br> - Color Mode: RGB (8,036 images)<br> - Mean Color (RGB): (137.58, 136.29, 144.00)<br> - Standard Deviation (RGB): (47.26, 49.73, 50.89)<br> - **Brightness:**<br> - Average: 138.74<br> ---<br> <br> <h2>Named Entity Recognition (NER) Dataset</h2> <i>This dataset is used for categorizing extracted text from signboards.</i><br> <br> - **Total Entries:** 42,547<br> - **Unique Categories:** 10<br> - **Category Distribution:**<br> - Religious Sites: 10,641<br> - Retail Outlets: 8,275<br> - Educational Institutions: 6,826<br> - Healthcare Institutions: 4,708<br> - Restaurants: 3,868<br> - Pharmacies: 3,637<br> - Parks: 1,547<br> - Banks: 1,121<br> - Stations: 1,094<br> - Hotels: 830<br> <br> #### **Word Count Statistics:**<br> - **Overall Word Count:**<br> - Maximum: 18<br> - Minimum: 1<br> - Mean: 3.82<br> - **Category-Wise Word Count:*<br>* - **Banks:** Mean: 4.65, Max: 11, Min: 1<br> - **Educational Institutions:** Mean: 4.60, Max: 18, Min: 1<br> - **Healthcare Institutions:** Mean: 4.02, Max: 16, Min: 1<br> - **Religious Sites:** Mean: 4.36, Max: 17, Min: 1<br> - **Retail Outlets:** Mean: 3.08, Max: 15, Min: 1<br> - **Restaurants:** Mean: 3.36, Max: 13, Min: 1<br> - **Pharmacies:** Mean: 2.91, Max: 13, Min: 1<br> - **Parks:** Mean: 3.10, Max: 11, Min: 1<br> - **Stations:** Mean: 3.72, Max: 17, Min: 1<br> - **Hotels:** Mean: 3.12, Max: 12, Min: 1<br> This dataset is structured for a two-phase object detection pipeline with an additional text classification task to categorize extracted text from detected regions.
提供机构:
Harvard Dataverse
创建时间:
2025-03-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作