five

thomasht86/accident-conditions

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/thomasht86/accident-conditions
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: nlod-2.0 license_link: https://data.norge.no/nlod/en/2.0 language: - "no" tags: - road-images - norway - trondheim - traffic-accidents - accident-conditions - embeddings - multimodal - geospatial - gemini - image-editing - synthetic-conditions pretty_name: Norwegian Traffic Accident Scene Images with Embeddings size_categories: - 1K<n<10K task_categories: - image-feature-extraction - image-classification - zero-shot-image-classification configs: - config_name: default data_files: - split: train path: data/train/** dataset_info: - config_name: default features: - name: image dtype: image - name: nvdb_id dtype: int64 - name: accident_date dtype: string - name: accident_time dtype: string - name: year dtype: int64 - name: month dtype: int64 - name: day_of_week dtype: string - name: latitude dtype: float64 - name: longitude dtype: float64 - name: municipality_name dtype: string - name: municipality_number dtype: int64 - name: urban_area dtype: string - name: road_reference dtype: string - name: road_type dtype: string - name: speed_limit dtype: int64 - name: road_width dtype: float64 - name: light_conditions dtype: string - name: weather dtype: string - name: road_surface_condition dtype: string - name: temperature dtype: float64 - name: accident_type dtype: string - name: accident_code dtype: string - name: num_units dtype: int64 - name: num_cars dtype: int64 - name: num_trucks dtype: int64 - name: num_buses dtype: int64 - name: num_vans dtype: int64 - name: num_mc dtype: int64 - name: num_light_mc dtype: int64 - name: num_mopeds dtype: int64 - name: num_bicycles dtype: int64 - name: num_pedestrians dtype: int64 - name: num_escooters dtype: int64 - name: image_timestamp dtype: string - name: image_lat dtype: float64 - name: image_lon dtype: float64 - name: image_heading dtype: float64 - name: image_road_category dtype: string - name: image_road_number dtype: int64 - name: image_lane dtype: string - name: image_detected_objects dtype: string - name: address_text dtype: string - name: image_distance_m dtype: float64 - name: distance_km dtype: float64 - name: embedding sequence: float32 splits: - name: train num_examples: 3791 --- # Norwegian Traffic Accident Scene Images with Embeddings ![Original vs AI-Edited to Match Accident Conditions](collage.jpg) A dataset of **~3,800 road images** depicting the environmental conditions at the time of real traffic accidents in the Trondheim region of Norway (2006–2024). Source images from [Statens vegvesen](https://www.vegvesen.no/) (Vegbilder) have been **AI-edited using Gemini** to realistically match the recorded accident conditions (lighting, weather, road surface), then paired with rich accident metadata and 3072-dimensional image embeddings. ## Dataset Description - **Source images**: [Vegbilder](https://vegbilder.atlas.vegvesen.no/) (Statens vegvesen road camera images, 2025) - **Accident data**: [NVDB](https://nvdbapiles-v3.atlas.vegvesen.no/) (Norwegian Road Database, traffic accidents 2006–2024) - **License**: [NLOD 2.0](https://data.norge.no/nlod/en/2.0) (Norwegian Licence for Open Government Data) — free to use with attribution - **Attribution**: Statens vegvesen / Norwegian Public Roads Administration - **Area**: Trondheim, Norway (~40km radius) - **Image editing**: Gemini 3.1 Flash (image editing mode) — conditions applied based on accident metadata - **Embeddings**: 3072-dimensional vectors from `gemini-embedding-2-preview` ## How It Was Built Each accident in the NVDB database was matched to the **nearest road image** from Vegbilder (within 100m). Where the accident occurred under different environmental conditions than the source image (e.g., nighttime, rain, snow/ice on road), the image was **edited using Gemini** to realistically depict those conditions. Images where conditions already matched (daylight, clear, dry) were used as-is. ``` Accident metadata (NVDB) → Match to nearest road image (Vegbilder WFS) → Edit image to accident conditions (Gemini) → Generate embedding (Gemini Batch API) → Upload to HuggingFace ``` ## Dataset Structure Each example contains: ### Image & Identifiers | Field | Type | Description | |---|---|---| | `image` | Image | Road scene JPEG — edited to match accident conditions | | `nvdb_id` | int | Unique accident ID from NVDB | ### Accident Time & Location | Field | Type | Description | |---|---|---| | `accident_date` | string | Date of accident (ISO 8601) | | `accident_time` | string | Time of accident (HH:MM) | | `year` | int | Accident year (2006–2024) | | `month` | int | Month (1–12) | | `day_of_week` | string | Day in Norwegian (Mandag–Søndag) | | `latitude` | float | Accident latitude (WGS84) | | `longitude` | float | Accident longitude (WGS84) | | `municipality_name` | string | Municipality (e.g., Trondheim, Melhus) | | `municipality_number` | int | Norwegian municipality number | | `urban_area` | string | Tettsted (urban) / Ikke tettsted (rural) / Ukjent | ### Road Information | Field | Type | Description | |---|---|---| | `road_reference` | string | Road reference (e.g., "FV6594 S2D1 m2405") | | `road_type` | string | Road type (Vanlig veg/gate, Boliggate, Gang-/sykkelveg, etc.) | | `speed_limit` | int | Posted speed limit (km/h) | | `road_width` | float | Road width in meters | ### Environmental Conditions (at accident time) These fields describe the conditions that were applied to edit the source image: | Field | Type | Description | |---|---|---| | `light_conditions` | string | Dagslys / Mørkt med vegbelysning / Mørkt uten vegbelysning / Tusmørke | | `weather` | string | God sikt opphold / God sikt nedbør / Dårlig sikt nedbør / Tåke / etc. | | `road_surface_condition` | string | Tørr bar veg / Våt bar veg / Snø/isbelagt / Delvis snø/is / Glatt | | `temperature` | float | Temperature in °C at accident time | ### Accident Details | Field | Type | Description | |---|---|---| | `accident_type` | string | High-level type (Utforkjøring, Kryssende kjøreretning, etc.) | | `accident_code` | string | Detailed accident description | | `num_units` | int | Total units involved | | `num_cars` | int | Number of cars | | `num_trucks` | int | Number of trucks | | `num_buses` | int | Number of buses | | `num_vans` | int | Number of vans | | `num_mc` | int | Number of motorcycles | | `num_light_mc` | int | Number of light motorcycles | | `num_mopeds` | int | Number of mopeds | | `num_bicycles` | int | Number of bicycles | | `num_pedestrians` | int | Number of pedestrians | | `num_escooters` | int | Number of e-scooters | ### Source Image Metadata (from Vegbilder) | Field | Type | Description | |---|---|---| | `image_timestamp` | string | When the source image was captured (ISO 8601) | | `image_lat` | float | Source image latitude (WGS84) | | `image_lon` | float | Source image longitude (WGS84) | | `image_heading` | float | Camera heading in degrees | | `image_road_category` | string | Road category: E (European), R (National), F (County) | | `image_road_number` | int | Road number from Vegbilder | | `image_lane` | string | Lane code (1 or 2, indicating direction) | | `image_detected_objects` | string | Auto-detected objects as JSON (e.g., `{"car": "1"}`) | | `address_text` | string | Nearest address from [Geonorge](https://ws.geonorge.no/) (e.g., "Innherredsveien 1, 7014 TRONDHEIM, TRONDHEIM") | | `image_distance_m` | float | Distance from accident to source image location (meters) | | `distance_km` | float | Distance from Trondheim city center (km) | ### Embedding | Field | Type | Description | |---|---|---| | `embedding` | list[float] | 3072-dim image embedding from `gemini-embedding-2-preview` | ## Usage ### Load the dataset ```python from datasets import load_dataset ds = load_dataset("thomasht86/accident-conditions", split="train") example = ds[0] print(example["accident_type"]) # "Utforkjøring" print(example["light_conditions"]) # "Mørkt uten vegbelysning" print(example["road_surface_condition"])# "Snø / isbelagt veg" print(len(example["embedding"])) # 3072 ``` ### Stream the dataset ```python from datasets import load_dataset ds = load_dataset("thomasht86/accident-conditions", split="train", streaming=True) for example in ds: image = example["image"] conditions = f"{example['light_conditions']} / {example['weather']} / {example['road_surface_condition']}" print(f"Accident {example['nvdb_id']}: {conditions}") ``` ### Use embeddings for similarity search ```python import numpy as np from datasets import load_dataset ds = load_dataset("thomasht86/accident-conditions", split="train") embeddings = np.array(ds["embedding"]) # (~3800, 3072) # Find scenes similar to the first one query = embeddings[0] similarities = embeddings @ query / (np.linalg.norm(embeddings, axis=1) * np.linalg.norm(query)) top_k = np.argsort(similarities)[-5:][::-1] for idx in top_k: ex = ds[int(idx)] print(f" {ex['accident_type']} | {ex['light_conditions']} | {ex['weather']} (sim: {similarities[idx]:.3f})") ``` ### Filter by conditions ```python # Night accidents only night = ds.filter(lambda x: "Mørkt" in str(x["light_conditions"])) # Winter accidents with snow/ice winter_ice = ds.filter( lambda x: x["month"] in (11, 12, 1, 2, 3) and "snø" in str(x["road_surface_condition"]).lower() ) # High-speed road accidents fast = ds.filter(lambda x: x["speed_limit"] is not None and x["speed_limit"] >= 80) ``` ## Data Collection Pipeline 1. **Accident data** fetched from NVDB API (traffic accidents in Trondheim area, 2006–2024) 2. **Image matching** via Vegbilder WFS — each accident matched to nearest road image within 100m 3. **Condition editing** via Gemini 3.1 Flash — images edited to match accident lighting, weather, and road surface conditions. ~45% of images needed editing; the rest already matched. 4. **Embeddings** generated via Gemini Batch API (`gemini-embedding-2-preview`, 3072 dims) ## Intended Uses - Visual search for accident scenes by condition similarity (embedding-based retrieval) - Training and evaluation of road condition classifiers - Analysis of accident patterns by environmental conditions - Multimodal search applications (text-to-image via shared Gemini embedding space) - Road safety research and visualization ## Limitations - **AI-edited images**: ~55% of images are synthetically edited to match accident conditions. While Gemini produces realistic results, they are not real photographs of the accident scene. - **Temporal mismatch**: Source images are from 2025; accidents span 2006–2024. Road geometry may have changed. - **Spatial approximation**: Images are matched within 100m of the accident location, not the exact spot. - **Coverage**: Limited to the Trondheim area (~40km radius). 20 accidents with matched images could not be edited. - **Embeddings**: Generated from a preview model (`gemini-embedding-2-preview`) which may change. ## Citation If you use this dataset, please credit the original data sources: ``` Statens vegvesen (2025). Vegbilder & NVDB. Norwegian Public Roads Administration. Licensed under NLOD 2.0: https://data.norge.no/nlod/en/2.0 ``` ## Related Datasets - [thomasht86/road-images-and-embeddings](https://huggingface.co/datasets/thomasht86/road-images-and-embeddings) — 34,908 road images from the same area (unedited, with embeddings)
提供机构:
thomasht86
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作