Harrison-xie/GSO
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Harrison-xie/GSO
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: GSO-Net
task_categories:
- object-detection
- image-classification
tags:
- computer-vision
- industrial-vision
- safety-monitoring
- sop-compliance
- benchmark
size_categories:
- 10K<n<100K
annotations_creators:
- expert-generated
language:
- en
license: other
---
# GSO-Net
> GSO-Net studies SOP understanding under sparse industrial polling, where procedural status must be inferred from localized operational evidence rather than dense temporal continuity.
GSO-Net is a large-scale benchmark for visual Standard Operating Procedure (SOP) understanding in petrochemical unloading scenarios. It is designed for realistic industrial deployment, where monitoring often relies on sparse round-robin visual polling rather than dense continuous video.
The dataset contains **50,325 independently sampled frames** collected from **63 real petrochemical stations**. It adopts a hierarchical annotation scheme linking **9 macroscopic procedural steps** with **15 microscopic operational states**, enabling evaluation of both procedural understanding and localized state grounding.
## Highlights
- **Real industrial scenario:** petrochemical unloading under practical surveillance conditions
- **Hierarchical labels:** 9 macro steps + 15 micro states
- **Sparse-polling setting:** built for intermittent observation rather than dense video
- **Challenging benchmark factors:** small critical targets, long-tailed operational states, cross-site variation, weather and illumination changes
- **Engineering-oriented focus:** evaluates whether models can support reliable SOP monitoring under deployment constraints
## Tasks
### Task 1: Joint Step-State Detection
The core task of GSO-Net.
Models jointly predict:
- **15 microscopic operational states**
- **9 macroscopic procedural steps**
This task evaluates whether a model can ground localized operational evidence and relate it to procedural stages.
### Task 2: Frame-Level Step Classification
A diagnostic reference task.
Models classify one of the **9 macroscopic SOP steps** from the whole image using global visual evidence only.
This task is intentionally weaker than Task 1 and is included to show the limitation of holistic classification without explicit localized state grounding.
## Dataset Scale
- **Total images:** 50,325
- **Stations:** 63
- **Bounding boxes:** 321,432
- **Training images:** 40,976
- **Validation images:** 9,349
- **Split protocol:** strict cross-site split
## Why GSO-Net?
Existing industrial benchmarks mainly focus on anomaly detection, object presence, or dense procedural parsing. GSO-Net instead targets **hazardous SOP understanding under sparse industrial polling**, where procedural status must be inferred from incomplete observations and localized visual evidence.
The benchmark is particularly challenging because:
- many decisive cues are **tiny and localized**
- operational states are **long-tailed**
- adjacent procedural stages may look **globally similar**
- stage boundaries often depend on **functional state transitions** rather than large scene changes
## Data Structure
A typical release includes:
```text
GSO-Net/
├── images/
│ ├── train/
│ └── val/
├── labels/
│ ├── train/
│ ├── val/
├── annotations/
│ ├── instances_train2017/
│ ├── instances_val2017/
├── splits
├── README.md
└── LICENSE
提供机构:
Harrison-xie



