govtech/SynthSite

Name: govtech/SynthSite
Creator: govtech
Published: 2026-04-09 09:05:18
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/govtech/SynthSite

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other license_name: govtech-singapore license_link: LICENSE language: - en tags: - video - safety - construction - synthetic - benchmark - computer-vision --- # SynthSite **SynthSite** is a curated benchmark of **227 synthetic construction site safety videos** (115 unsafe, 112 safe) generated using four text-to-video models: `Sora 2 Pro`, `Veo 3.1`, `Wan 2.2-14B`, and `Wan 2.6`. Each video was independently labeled by **2–3 human reviewers** for the presence of a **Worker Under Suspended Load** hazard, producing a binary classification: **unsafe** (`True_Positive` — worker remains in the suspended-load fall zone) or **safe** (`False_Positive` — no worker in fall zone or load not truly suspended). All content is fully AI-generated — no real people or real incidents are depicted. The terms "lifted load" and "suspended load" are used interchangeably throughout this repo.  --- ## Sample Frames Each video is a short clip (5–10 seconds, 1280×720, 16–30 FPS). Frames below are sampled from unsafe and safe videos. ### Unsafe — Worker Under Suspended Load <table> <tr> <td align="center"><img src="assets/tp_sora_site.png" width="220"/></td> <td align="center"><img src="assets/tp_sora_steel.png" width="220"/></td> <td align="center"><img src="assets/tp_veo_night.png" width="220"/></td> </tr> <tr> <td align="center"><img src="assets/tp_veo_rain.png" width="220"/></td> <td align="center"><img src="assets/tp_wan22_pipe.png" width="220"/></td> <td align="center"><img src="assets/tp_wan26_rebar.png" width="220"/></td> </tr> </table> ### Safe — No Hazard Present <table> <tr> <td align="center"><img src="assets/fp_sora_site.png" width="220"/></td> <td align="center"><img src="assets/fp_veo_day.png" width="220"/></td> <td align="center"><img src="assets/fp_veo_sunset.png" width="220"/></td> </tr> <tr> <td align="center"><img src="assets/fp_veo_night.png" width="220"/></td> <td align="center"><img src="assets/fp_wan22_quiet.png" width="220"/></td> <td align="center"><img src="assets/fp_wan26_tunnel.png" width="220"/></td> </tr> </table> --- ## Generation Videos were generated from four models: | Model | Provider | Model License | Output Rights | Videos | |-------|----------|---------------|---------------|--------| | `Sora 2 Pro` | OpenAI | Proprietary (API ToS) | Assigned to user | 6 | | `Veo 3.1` | Google | Proprietary (API ToS) | Not claimed by Google | 124 | | `Wan 2.2-14B` | Alibaba / Wan-AI | Apache 2.0 | Not claimed by creators | 77 | | `Wan 2.6` | Alibaba / Wan-AI | Apache 2.0 | Not claimed by creators | 20 | All four models either assign output rights to the user or explicitly disclaim ownership of generated content, permitting redistribution under this dataset's license. The terms of service for each model were reviewed and confirmed to be compatible with open dataset release for academic and research use. No real video footage is included. The generation process used text intermediaries that sufficiently abstract away details from any source material, preventing re-identification. Provenance metadata (C2PA for Sora, SynthID for Veo) has been preserved. From an initial pool of **487** candidates, each video was manually screened before hazard annotation. Videos were excluded if they contained synthesis failures that prevented reliable hazard judgment — including severe artifacts, missing or fused workers/loads/machinery, abrupt appearance or disappearance of key entities, or major temporal corruption. Minor imperfections that did not hinder hazard interpretation (e.g., mild geometric distortion, low resolution, or corrupted text overlays) were retained. **227 videos (47%)** passed screening and form the final benchmark. ### Gemini 2.5 Flash Evaluation Each video was assessed using a rubric-guided Gemini 2.5 Flash pipeline with schema-constrained structured outputs, characterizing diversity, complexity, and realism. The evaluation script, rubric, and per-video results are in `code/gemini_synthsite_eval.py` and `results/gemini_synthsite_eval/`. ### VBench Statistics Scores are normalized to [0, 1], with higher values indicating better temporal stability. | Metric | Mean | Median | Std | |---|---|---|---| | Subject consistency | 0.9770 | 0.9826 | 0.0191 | | Background consistency | 0.9701 | 0.9769 | 0.0191 | | Motion smoothness | 0.9952 | 0.9962 | 0.0024 | | Temporal flickering | 0.9903 | 0.9935 | 0.0127 | --- ## Inter-Annotator Agreement | Scope | Videos | Weighted Avg Cohen's Kappa | Krippendorff's Alpha | |-------|--------|---------------------------|----------------------| | Global | 227 | 0.20 | 0.42 | | Tier 1 | 150 | 1.00 | 1.00 | | Tier 2 | 77 | -0.24 | -0.33 | The low global Kappa (0.20) is expected — it is driven by the intentionally-included ambiguous Tier 2 videos. Tier 1 achieves perfect agreement (1.00), while Tier 2 is below chance, confirming these videos are genuinely ambiguous for human reviewers. Pairwise metrics available in `results/annotators_agreement/cohens_kappa.csv`. --- ## Tier System Videos are assigned to tiers based on inter-annotator agreement: | Tier | Condition | Count | Interpretation | |------|-----------|-------|----------------| | Tier 1 | All reviewers agree | **150** (66%) | High-confidence ground truth | | Tier 2 | Reviewers disagree | **77** (34%) | Ambiguous — genuine disagreement | Systems can be scored primarily on Tier 1 (reliable ground truth), with Tier 2 performance reported separately as the hard set. --- ## Dataset Structure ``` SynthSite/ ├── LICENSE ├── README.md ├── synthetic_video_labels.csv ├── assets/ — Sample frame images for README ├── videos/ │ └── *.mp4 (227 files) ├── results/ │ ├── annotators_agreement/ │ │ ├── cohens_kappa.csv │ │ ├── krippendorffs_alpha.csv │ │ └── summary.csv │ ├── detector_outcome/ │ │ ├── confusion_matrix_grounding_dino.csv │ │ ├── confusion_matrix_yolo_world.csv │ │ ├── results_grounding_dino.csv │ │ └── results_yolo_world.csv │ ├── gemini_synthsite_eval/ │ │ └── *.json (227 files — Gemini 2.5 Flash per-video assessments) │ └── vbench_eval/ │ ├── synthsite_eval_results.json │ └── synthsite_scores.csv ├── code/ │ ├── gemini_synthsite_eval.py — Gemini video evaluation pipeline │ ├── gemini_agg_analysis.py — Aggregate Gemini outputs into tables/figures │ └── hazard_detector_eval.py — YOLO World + Grounding DINO evaluation └── docs/ ├── detection_design.md — Technical reference on detection approach ├── detection_parameters.md — Filter cascade design and calibration ├── detector_scores.md — Per-tier, per-generator detection results └── vbench_scores.md — VBench quality metrics for synthetic videos ``` ### Fields (`synthetic_video_labels.csv`) | Field | Type | Description | |-------|------|-------------| | `filename` | string | Video filename (matches file in `videos/`) | | `num_labelers` | int | Number of reviewers (2-3) | | `tier` | int | Agreement tier: `1` (agreement) or `2` (disagreement) | | `resolved_label` | string | Majority-vote label: `True_Positive` or `False_Positive` | | `labeler_N_name` | string | Anonymized reviewer ID (`Reviewer_01` through `Reviewer_08`) | | `labeler_N_label` | string | `True_Positive` or `False_Positive` | Columns repeat for N = 1 to 3. A 3rd reviewer is present only when the first two disagreed (tiebreaker). Empty values indicate fewer reviewers. --- ## Intended Uses - Benchmarking computer vision systems for construction site hazard detection - Studying the effectiveness of synthetic data for safety-critical applications - Evaluating inter-annotator agreement on ambiguous safety scenarios - Comparing video generation models for domain-specific content creation ## Out-of-Scope Uses - Misrepresenting synthetic content as real incident footage - Training systems intended to cause harm or circumvent safety measures - Substituting for real-world safety assessments or compliance evaluations --- ## Ethics - **No real people** appear in any video — all content is AI-generated - **No real incidents** are depicted — scenarios are synthetic constructions - Source material was abstracted through text intermediaries, preventing re-identification - Generative model terms of service were reviewed and complied with - Provenance metadata (C2PA, SynthID) preserved for transparency - Reviewer identities have been anonymized --- ## License This dataset is released under a custom GovTech Singapore license. See [`LICENSE`](LICENSE) for full terms. --- ## Getting Started This repository uses **Git LFS** for large files (videos, images). To clone with all assets: ```bash git lfs install git clone <repo-url> ``` Without Git LFS, video and image files will be downloaded as small pointer files.

提供机构：

govtech

搜集汇总

数据集介绍

构建方式

在计算机视觉与建筑安全交叉领域，SynthSite数据集通过精心设计的流程构建而成。研究团队运用Sora 2 Pro、Veo 3.1、Wan 2.2-14B和Wan 2.6四种文本到视频生成模型，基于建筑安全场景的文本描述生成了初始487段合成视频。经过人工审核，筛选出视觉质量与真实性达标的227段视频，并邀请2至3名评审员对每段视频中是否存在“工人位于吊装载荷下方”的危险进行独立标注，形成二元分类标签。所有内容均为人工智能生成，未涉及任何真实人物或事件。

特点

该数据集的核心特征体现在其严谨的质量分层与标注一致性分析上。依据评审员间的一致性程度，视频被划分为两个层级：第一层级包含150段高置信度样本，评审意见完全统一；第二层级涵盖77段存在分歧的模糊案例，反映了人类在安全场景判断中的固有歧义性。数据集保留了完整的生成溯源元数据，并附有详细的标注者间一致性度量，为研究合成数据在安全关键应用中的可靠性提供了多维度的评估基准。

使用方法

SynthSite主要服务于建筑安全风险检测的算法评测与合成数据有效性研究。使用者可通过解析数据集目录中的标签文件与视频文件，构建计算机视觉模型的训练或测试集。建议优先采用第一层级数据作为可靠基准进行模型性能评估，并将第二层级数据用于分析算法在模糊场景下的鲁棒性。数据集附带的一致性度量文件支持对标注歧义的深入分析，但需注意其合成属性，不可替代真实世界的安全评估。

背景与挑战

背景概述

随着人工智能技术在工业安全领域的深入应用，构建高效、可靠的视觉检测系统成为保障施工现场安全的关键。SynthSite数据集应运而生，由新加坡政府科技局（GovTech Singapore）的研究团队于近期创建，旨在通过合成视频数据解决传统安全监控中数据稀缺与隐私保护的矛盾。该数据集聚焦于“起重负载下工人”这一特定安全隐患的识别，利用Sora 2 Pro、Veo 3.1等先进文本生成视频模型，生成了227段高质量合成视频，并由多名评审员进行人工标注，形成二元分类标签。其核心研究问题在于探索合成数据在安全关键场景中的有效性，为计算机视觉模型提供可扩展的基准测试资源，推动自动化风险检测技术的发展，并对合成数据在工业领域的标准化应用产生深远影响。

当前挑战

SynthSite数据集致力于解决施工现场安全隐患自动检测的领域挑战，其核心在于克服真实监控视频因隐私、安全及数据获取成本高昂而导致的训练样本不足问题。然而，合成数据与真实场景间的分布差异可能限制模型的泛化能力，且“起重负载下工人”这一细粒度视觉概念的识别本身存在语义模糊性，易受遮挡、视角多变等因素干扰。在构建过程中，研究团队面临多重挑战：首先，从初始487段候选视频中筛选出视觉质量与真实性达标的样本，保留率仅为47%，凸显了生成模型在特定领域内容可控性上的局限；其次，标注过程中评审员间的一致性较低，约34%的视频存在分歧，反映出合成场景中安全隐患判定的主观歧义，这要求数据集采用分层评估机制以区分高置信度与模糊样本，增加了基准测试的复杂性。

常用场景

经典使用场景

在计算机视觉领域，SynthSite数据集为建筑工地安全监控研究提供了宝贵的基准资源。该数据集通过合成视频模拟了“工人位于吊装载荷下方”这一典型安全隐患场景，使得研究者能够在无需真实事故录像的情况下，系统评估目标检测与行为识别算法的性能。其分层标注机制进一步区分了高置信度样本与模糊样本，为模型鲁棒性测试创造了条件。

解决学术问题

SynthSite主要解决了安全关键领域真实数据稀缺且获取成本高昂的学术难题。通过提供大规模、标注规范的合成视频，它使得研究人员能够深入探索合成数据在视觉任务中的有效性，特别是在模型泛化能力与标注一致性评估方面。该数据集还促进了关于人机标注歧义性比较的研究，为理解复杂场景下的认知差异提供了实证基础。

衍生相关工作

围绕SynthSite，已衍生出多项聚焦于合成数据质量评估与跨模型泛化的研究。例如，有工作探讨了不同生成模型（如Sora、Veo）输出视频在安全检测任务中的差异性；另有研究利用其分层标注结构，开发了针对模糊样本的主动学习框架。这些工作共同推动了合成数据在专业垂直领域的可靠应用范式。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集