thomasht86/accident-conditions
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/thomasht86/accident-conditions
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: nlod-2.0
license_link: https://data.norge.no/nlod/en/2.0
language:
- "no"
tags:
- road-images
- norway
- trondheim
- traffic-accidents
- accident-conditions
- embeddings
- multimodal
- geospatial
- gemini
- image-editing
- synthetic-conditions
pretty_name: Norwegian Traffic Accident Scene Images with Embeddings
size_categories:
- 1K<n<10K
task_categories:
- image-feature-extraction
- image-classification
- zero-shot-image-classification
configs:
- config_name: default
data_files:
- split: train
path: data/train/**
dataset_info:
- config_name: default
features:
- name: image
dtype: image
- name: nvdb_id
dtype: int64
- name: accident_date
dtype: string
- name: accident_time
dtype: string
- name: year
dtype: int64
- name: month
dtype: int64
- name: day_of_week
dtype: string
- name: latitude
dtype: float64
- name: longitude
dtype: float64
- name: municipality_name
dtype: string
- name: municipality_number
dtype: int64
- name: urban_area
dtype: string
- name: road_reference
dtype: string
- name: road_type
dtype: string
- name: speed_limit
dtype: int64
- name: road_width
dtype: float64
- name: light_conditions
dtype: string
- name: weather
dtype: string
- name: road_surface_condition
dtype: string
- name: temperature
dtype: float64
- name: accident_type
dtype: string
- name: accident_code
dtype: string
- name: num_units
dtype: int64
- name: num_cars
dtype: int64
- name: num_trucks
dtype: int64
- name: num_buses
dtype: int64
- name: num_vans
dtype: int64
- name: num_mc
dtype: int64
- name: num_light_mc
dtype: int64
- name: num_mopeds
dtype: int64
- name: num_bicycles
dtype: int64
- name: num_pedestrians
dtype: int64
- name: num_escooters
dtype: int64
- name: image_timestamp
dtype: string
- name: image_lat
dtype: float64
- name: image_lon
dtype: float64
- name: image_heading
dtype: float64
- name: image_road_category
dtype: string
- name: image_road_number
dtype: int64
- name: image_lane
dtype: string
- name: image_detected_objects
dtype: string
- name: address_text
dtype: string
- name: image_distance_m
dtype: float64
- name: distance_km
dtype: float64
- name: embedding
sequence: float32
splits:
- name: train
num_examples: 3791
---
# Norwegian Traffic Accident Scene Images with Embeddings

A dataset of **~3,800 road images** depicting the environmental conditions at the time of real traffic accidents in the Trondheim region of Norway (2006–2024). Source images from [Statens vegvesen](https://www.vegvesen.no/) (Vegbilder) have been **AI-edited using Gemini** to realistically match the recorded accident conditions (lighting, weather, road surface), then paired with rich accident metadata and 3072-dimensional image embeddings.
## Dataset Description
- **Source images**: [Vegbilder](https://vegbilder.atlas.vegvesen.no/) (Statens vegvesen road camera images, 2025)
- **Accident data**: [NVDB](https://nvdbapiles-v3.atlas.vegvesen.no/) (Norwegian Road Database, traffic accidents 2006–2024)
- **License**: [NLOD 2.0](https://data.norge.no/nlod/en/2.0) (Norwegian Licence for Open Government Data) — free to use with attribution
- **Attribution**: Statens vegvesen / Norwegian Public Roads Administration
- **Area**: Trondheim, Norway (~40km radius)
- **Image editing**: Gemini 3.1 Flash (image editing mode) — conditions applied based on accident metadata
- **Embeddings**: 3072-dimensional vectors from `gemini-embedding-2-preview`
## How It Was Built
Each accident in the NVDB database was matched to the **nearest road image** from Vegbilder (within 100m). Where the accident occurred under different environmental conditions than the source image (e.g., nighttime, rain, snow/ice on road), the image was **edited using Gemini** to realistically depict those conditions. Images where conditions already matched (daylight, clear, dry) were used as-is.
```
Accident metadata (NVDB) → Match to nearest road image (Vegbilder WFS)
→ Edit image to accident conditions (Gemini)
→ Generate embedding (Gemini Batch API)
→ Upload to HuggingFace
```
## Dataset Structure
Each example contains:
### Image & Identifiers
| Field | Type | Description |
|---|---|---|
| `image` | Image | Road scene JPEG — edited to match accident conditions |
| `nvdb_id` | int | Unique accident ID from NVDB |
### Accident Time & Location
| Field | Type | Description |
|---|---|---|
| `accident_date` | string | Date of accident (ISO 8601) |
| `accident_time` | string | Time of accident (HH:MM) |
| `year` | int | Accident year (2006–2024) |
| `month` | int | Month (1–12) |
| `day_of_week` | string | Day in Norwegian (Mandag–Søndag) |
| `latitude` | float | Accident latitude (WGS84) |
| `longitude` | float | Accident longitude (WGS84) |
| `municipality_name` | string | Municipality (e.g., Trondheim, Melhus) |
| `municipality_number` | int | Norwegian municipality number |
| `urban_area` | string | Tettsted (urban) / Ikke tettsted (rural) / Ukjent |
### Road Information
| Field | Type | Description |
|---|---|---|
| `road_reference` | string | Road reference (e.g., "FV6594 S2D1 m2405") |
| `road_type` | string | Road type (Vanlig veg/gate, Boliggate, Gang-/sykkelveg, etc.) |
| `speed_limit` | int | Posted speed limit (km/h) |
| `road_width` | float | Road width in meters |
### Environmental Conditions (at accident time)
These fields describe the conditions that were applied to edit the source image:
| Field | Type | Description |
|---|---|---|
| `light_conditions` | string | Dagslys / Mørkt med vegbelysning / Mørkt uten vegbelysning / Tusmørke |
| `weather` | string | God sikt opphold / God sikt nedbør / Dårlig sikt nedbør / Tåke / etc. |
| `road_surface_condition` | string | Tørr bar veg / Våt bar veg / Snø/isbelagt / Delvis snø/is / Glatt |
| `temperature` | float | Temperature in °C at accident time |
### Accident Details
| Field | Type | Description |
|---|---|---|
| `accident_type` | string | High-level type (Utforkjøring, Kryssende kjøreretning, etc.) |
| `accident_code` | string | Detailed accident description |
| `num_units` | int | Total units involved |
| `num_cars` | int | Number of cars |
| `num_trucks` | int | Number of trucks |
| `num_buses` | int | Number of buses |
| `num_vans` | int | Number of vans |
| `num_mc` | int | Number of motorcycles |
| `num_light_mc` | int | Number of light motorcycles |
| `num_mopeds` | int | Number of mopeds |
| `num_bicycles` | int | Number of bicycles |
| `num_pedestrians` | int | Number of pedestrians |
| `num_escooters` | int | Number of e-scooters |
### Source Image Metadata (from Vegbilder)
| Field | Type | Description |
|---|---|---|
| `image_timestamp` | string | When the source image was captured (ISO 8601) |
| `image_lat` | float | Source image latitude (WGS84) |
| `image_lon` | float | Source image longitude (WGS84) |
| `image_heading` | float | Camera heading in degrees |
| `image_road_category` | string | Road category: E (European), R (National), F (County) |
| `image_road_number` | int | Road number from Vegbilder |
| `image_lane` | string | Lane code (1 or 2, indicating direction) |
| `image_detected_objects` | string | Auto-detected objects as JSON (e.g., `{"car": "1"}`) |
| `address_text` | string | Nearest address from [Geonorge](https://ws.geonorge.no/) (e.g., "Innherredsveien 1, 7014 TRONDHEIM, TRONDHEIM") |
| `image_distance_m` | float | Distance from accident to source image location (meters) |
| `distance_km` | float | Distance from Trondheim city center (km) |
### Embedding
| Field | Type | Description |
|---|---|---|
| `embedding` | list[float] | 3072-dim image embedding from `gemini-embedding-2-preview` |
## Usage
### Load the dataset
```python
from datasets import load_dataset
ds = load_dataset("thomasht86/accident-conditions", split="train")
example = ds[0]
print(example["accident_type"]) # "Utforkjøring"
print(example["light_conditions"]) # "Mørkt uten vegbelysning"
print(example["road_surface_condition"])# "Snø / isbelagt veg"
print(len(example["embedding"])) # 3072
```
### Stream the dataset
```python
from datasets import load_dataset
ds = load_dataset("thomasht86/accident-conditions", split="train", streaming=True)
for example in ds:
image = example["image"]
conditions = f"{example['light_conditions']} / {example['weather']} / {example['road_surface_condition']}"
print(f"Accident {example['nvdb_id']}: {conditions}")
```
### Use embeddings for similarity search
```python
import numpy as np
from datasets import load_dataset
ds = load_dataset("thomasht86/accident-conditions", split="train")
embeddings = np.array(ds["embedding"]) # (~3800, 3072)
# Find scenes similar to the first one
query = embeddings[0]
similarities = embeddings @ query / (np.linalg.norm(embeddings, axis=1) * np.linalg.norm(query))
top_k = np.argsort(similarities)[-5:][::-1]
for idx in top_k:
ex = ds[int(idx)]
print(f" {ex['accident_type']} | {ex['light_conditions']} | {ex['weather']} (sim: {similarities[idx]:.3f})")
```
### Filter by conditions
```python
# Night accidents only
night = ds.filter(lambda x: "Mørkt" in str(x["light_conditions"]))
# Winter accidents with snow/ice
winter_ice = ds.filter(
lambda x: x["month"] in (11, 12, 1, 2, 3) and "snø" in str(x["road_surface_condition"]).lower()
)
# High-speed road accidents
fast = ds.filter(lambda x: x["speed_limit"] is not None and x["speed_limit"] >= 80)
```
## Data Collection Pipeline
1. **Accident data** fetched from NVDB API (traffic accidents in Trondheim area, 2006–2024)
2. **Image matching** via Vegbilder WFS — each accident matched to nearest road image within 100m
3. **Condition editing** via Gemini 3.1 Flash — images edited to match accident lighting, weather, and road surface conditions. ~45% of images needed editing; the rest already matched.
4. **Embeddings** generated via Gemini Batch API (`gemini-embedding-2-preview`, 3072 dims)
## Intended Uses
- Visual search for accident scenes by condition similarity (embedding-based retrieval)
- Training and evaluation of road condition classifiers
- Analysis of accident patterns by environmental conditions
- Multimodal search applications (text-to-image via shared Gemini embedding space)
- Road safety research and visualization
## Limitations
- **AI-edited images**: ~55% of images are synthetically edited to match accident conditions. While Gemini produces realistic results, they are not real photographs of the accident scene.
- **Temporal mismatch**: Source images are from 2025; accidents span 2006–2024. Road geometry may have changed.
- **Spatial approximation**: Images are matched within 100m of the accident location, not the exact spot.
- **Coverage**: Limited to the Trondheim area (~40km radius). 20 accidents with matched images could not be edited.
- **Embeddings**: Generated from a preview model (`gemini-embedding-2-preview`) which may change.
## Citation
If you use this dataset, please credit the original data sources:
```
Statens vegvesen (2025). Vegbilder & NVDB. Norwegian Public Roads Administration.
Licensed under NLOD 2.0: https://data.norge.no/nlod/en/2.0
```
## Related Datasets
- [thomasht86/road-images-and-embeddings](https://huggingface.co/datasets/thomasht86/road-images-and-embeddings) — 34,908 road images from the same area (unedited, with embeddings)
提供机构:
thomasht86



