mehmetkeremturkcan/UrbanOmniView
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mehmetkeremturkcan/UrbanOmniView
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-4.0
task_categories:
- keypoint-detection
- object-detection
tags:
- autonomous-driving
- 3d-object-detection
- bird-eye-view
- pose-estimation
- synthetic-data
- v2x
- multi-view
- urban
- traffic
size_categories:
- 10K<n<50K
---
# UrbanOmniView Dataset
**A Multi-Perspective Multi-Class Dataset for Urban Traffic Participant Pose Estimation and Tracking**
[](https://huggingface.co/mehmetkeremturkcan/UrbanOmniDetect)
[](https://arxiv.org/abs/XXXX.XXXXX)
[](https://github.com/mkturkcan/urbanomnidetect)
<p align="center">
<img src="https://github.com/mkturkcan/urbanomnidetect/raw/main/assets/urbanomniview.png" width="100%" alt="UrbanOmniDetect pipeline overview"/>
</p>
## Dataset Description
UrbanOmniView is a large-scale dataset for calibration-free monocular 3D object detection across multiple camera viewpoints. It combines real-world driving data, real-world infrastructure data, and high-fidelity synthetic data to cover the full range of perspectives encountered in modern urban sensing: ego-vehicle dashcams, pole-mounted traffic cameras, and aerial drones.
Each annotated object carries a 2D bounding box, a class label, and eight ordered 2D keypoints that represent the projections of its 3D bounding box corners onto the image plane. The first four keypoints are the top corners and the last four are the ground-contact corners. This annotation format enables both standard 2D detection and calibration-free 3D reasoning without requiring camera intrinsics at inference time.
## Classes
The dataset contains three object classes relevant to urban traffic scenarios:
| Class | Description |
|:------|:-----------|
| car | Passenger vehicles, trucks, vans, buses |
| person | Pedestrians |
| bike | Bicycles, motorcycles, scooters |
## Annotation Format
Annotations follow the YOLO keypoint format. Each object is described by:
- **Bounding box.** Normalized center-x, center-y, width, height.
- **Class label.** Integer index into the class list above.
- **Keypoints.** Eight ordered 2D points, each with x, y coordinates. Indices 0 to 3 are the top corners of the 3D bounding box. Indices 4 to 7 are the bottom corners that contact the ground plane.
The keypoint ordering is consistent across all viewpoints, enabling a single model to learn orientation-aware detection regardless of camera placement.
## Data Sources
### KITTI
The KITTI Vision Benchmark Suite provides stereo images and 3D annotations from a car-mounted sensor rig in Karlsruhe, Germany. We use the left-camera images and project the provided 3D box labels into 2D keypoints using the calibration matrices. Calibration data is used only for label generation, never at inference time.
### DAIR-V2X
DAIR-V2X is a vehicle-infrastructure cooperative dataset collected at real intersections in Beijing. We use the infrastructure-side images, which are captured from elevated pole-mounted cameras looking down at traffic. The 3D annotations are projected into 2D keypoints using the provided infrastructure camera parameters.
### UE5 Synthetic
We generated 10,000 frames using the Unreal Engine 5 City Sample demo scene. The synthetic pipeline includes:
- **Dynamic environments.** Weather and lighting variation including rain, snow, and day/night cycles.
- **Randomized traffic.** Diverse vehicle, pedestrian, and cyclist assets placed procedurally.
- **Multi-viewpoint cameras.** Ground-level, infrastructure-pole, and drone viewpoints sampled via custom camera rig scripting.
- **Ray-traced rendering.** High-fidelity RGB output at 4K resolution with physically-based lighting.
- **Automatic annotation.** 3D bounding boxes extracted directly from engine object transforms and collision bounds, yielding pixel-accurate projected keypoints.
This synthetic component is released publicly as part of this work.
## Dataset Configuration
The dataset YAML for use with Ultralytics YOLO:
```yaml
path: ./urbanomniview/
train: '../.././urbanomniview_train.txt'
val: '../.././urbanomniview_val.txt'
test: '../.././urbanomniview_test.txt'
nc: 3
names: ['car', 'person', 'bike']
kpt_shape: [8, 2]
```
## Usage
Download the dataset and point the training script at the config file:
```bash
python train.py --data cfg/dataset/urbanomniview.yaml
```
For custom training with Ultralytics directly:
```python
from ultralytics import YOLO
model = YOLO("yolo11x-pose-p2.yaml").load("yolo11x.pt")
model.train(data="cfg/dataset/urbanomniview.yaml", imgsz=640, epochs=100)
```
## Benchmark Results
Models trained on UrbanOmniView generalize across all three viewpoint categories. The best configuration, YOLO11x with the P2 head at 1920x1920, achieves:
| Benchmark | Metric | Score |
|:----------|:-------|:---:|
| UrbanOmniView val | mAP<sub>50:95</sub> | 0.808 |
| KITTI val | AP<sub>3D</sub> Moderate | 30.71 |
| KITTI val | AP<sub>BEV</sub> Moderate | 35.19 |
| DAIR-V2X val | AP @ OKS = 0.50 | 0.938 |
Calibration-dependent baselines trained on single-viewpoint data score near zero when evaluated on viewpoints outside their training distribution. See the paper for detailed comparisons.
## Ethical Considerations
- **KITTI and DAIR-V2X** contain real-world street imagery. Faces and license plates may be visible. Users should comply with the original dataset licenses and local privacy regulations.
- **UE5 Synthetic** data contains no real individuals and raises no privacy concerns.
- The dataset is intended for research in autonomous driving, traffic safety, and cooperative perception. We discourage use for mass surveillance or any application that violates individual privacy rights.
## Citation
```bibtex
@inproceedings{turkcan2026urbanomnidetect,
title = {Calibration-Free View-Agnostic Monocular 3D Object Detection for Urban Scenes},
author = {Turkcan, Mehmet Kerem and Gumaste, Devika and Kostic, Zoran},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops},
year = {2026}
}
```
## Acknowledgements
This work was supported by the NSF Engineering Research Center for Smart Streetscapes under Award EEC-2133516, NSF Grants CNS-2450567 and CNS-2038984, and by computing resources from the NVIDIA Academic Grant Program and the Empire AI Consortium.
提供机构:
mehmetkeremturkcan



