Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13828407
下载链接
链接失效反馈官方服务:
资源简介:
Overview
The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with categorized axis-aligned bounding boxes (BBs) for vehicle detection from a high-altitude bird’s-eye view (BeV) perspective. Captured over Songdo International Business District, South Korea, this dataset consists of 5,419 annotated video frames, featuring approximately 300,000 vehicle instances categorized into four classes:
Car (including vans and light-duty vehicles)
Bus
Truck
Motorcycle
This dataset can serve as a benchmark for aerial vehicle detection, supporting research and real-world applications in intelligent transportation systems, traffic monitoring, and aerial vision-based mobility analytics. It was developed in the context of a multi-drone experiment aimed at enhancing geo-referenced vehicle trajectory extraction.
⚠️ Important: If you use this dataset in your work, please cite the following reference [1]:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, arXiv preprint arXiv:2411.02136.
(Note: This manuscript shall be replaced by the published version once available.)
Motivation
Publicly available datasets for aerial vehicle detection often exhibit limitations such as:
Non-BeV perspectives with varying angles and distortions
Inconsistent annotation quality, with loose or missing bounding boxes
Lower-resolution imagery, reducing detection accuracy, particularly for smaller vehicles
Lack of annotation detail, especially for motorcycles in dense urban scenes with complex backgrounds
To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.
Dataset Composition
The dataset is randomly split into training (80%) and test (20%) subsets:
Subset
Images
Car
Bus
Truck
Motorcycle
Total Vehicles
Train
4,335
195,539
7,030
11,779
2,963
217,311
Test
1,084
49,508
1,759
3,052
805
55,124
A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.
Data Collection
The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.
A fleet of 10 drones monitored 20 busy intersections, executing advanced flight plans to optimize coverage.
4K (3840×2160) RGB video footage was recorded at 29.97 FPS from altitudes of 140–150 meters.
Each drone flew 10 sessions per day, covering peak morning and afternoon periods.
The experiment resulted in 12TB of 4K raw video data.
More details on the experimental setup and data processing pipeline are available in [1].
Bounding Box Annotations & Formats
Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.
Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:
1. COCO JSON format
Single annotation file per dataset subset (i.e., one for training, one for testing).
Contains metadata such as image dimensions, bounding box coordinates, and class labels.
Example snippet:
{
"images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
"categories": [
{"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
{"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
]
}
2. YOLO TXT format
One annotation file per image, following the format:
Bounding box values are normalized to [0,1], with the origin at the top-left corner.
Example snippet:
0 0.52 0.63 0.10 0.05 # Car bounding box
2 0.25 0.40 0.15 0.08 # Truck bounding box
3. Pascal VOC XML format
One annotation file per image, structured in XML.
Contains image properties and absolute pixel coordinates for each bounding box.
Example snippet:
0001.jpg
384021603
car
500600600650
File Structure
The dataset is provided as two compressed archives:
1. Training Data (train.zip, 12.91 GB)
train/
│── coco_annotations.json # COCO format
│── images/
│ ├── 0001.jpg
│ ├── ...
│── labels/
│ ├── 0001.txt # YOLO format
│ ├── 0001.xml # Pascal VOC format
│ ├── ...
2. Testing Data (test.zip, 3.22 GB)
test/
│── coco_annotations.json
│── images/
│ ├── 00027.jpg
│ ├── ...
│── labels/
│ ├── 00027.txt
│ ├── 00027.xml
│ ├── ...
Additional Files
README.md – Dataset documentation (this description)
LICENSE.txt – Creative Commons Attribution 4.0 License
names.txt – Class names (one per line)
data.yaml – Example YOLO configuration file for training/testing
Citation & Attribution
Preferred Citation:
If you use Songdo Vision for any purpose, whether in academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying manuscript [1]:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, arXiv preprint arXiv:2411.02136.
(Note: This manuscript shall be replaced by the published version once available.)
Note: Although Zenodo automatically provides a formal dataset citation (shown below), we kindly request that you reference the manuscript as the primary source of this work.
Dataset Citation (for archival purposes):
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City (v1). Zenodo. DOI: 10.5281/zenodo.13828408.
⚠️ 概述
松岛视觉数据集(Songdo Vision)提供了高分辨率(4K,3840×2160像素)RGB图像,这些图像从高空鸟瞰视角(bird’s-eye view, BeV)进行标注,带有用于车辆检测的分类轴对齐边界框(axis-aligned bounding boxes, BBs)。本数据集采集自韩国松岛国际商务区,包含5419个标注视频帧,共计约30万个车辆实例,分为以下四类:
- 轿车(Car,含厢式货车与轻型车辆)
- 公共汽车(Bus)
- 卡车(Truck)
- 摩托车(Motorcycle)
本数据集可作为航空车辆检测的基准数据集,为智能交通系统、交通监控以及基于航空视觉的移动性分析等领域的研究与实际应用提供支撑。本数据集源于一项多无人机实验,旨在提升地理参考车辆轨迹提取的精度。
⚠️ 重要提示:若您的工作中使用本数据集,请引用以下文献[1]:
Robert Fonod、Haechan Cho、Hwasoo Yeo、Nikolas Geroliminis(2025)。《用于从无人机影像中提取地理参考车辆轨迹的先进计算机视觉技术》,arXiv预印本,arXiv:2411.02136。(注:本文稿将在正式出版后替换为最终版本。)
## 研发动机
公开可用的航空车辆检测数据集往往存在以下局限:
1. 视角非鸟瞰,存在角度变化与畸变;
2. 标注质量参差不齐,边界框存在标注松散或缺失问题;
3. 图像分辨率较低,会降低检测精度,尤其是对小型车辆;
4. 标注细节不足,在背景复杂的密集城市场景中尤为突出,例如摩托车的标注。
为解决上述挑战,松岛视觉数据集采用高质量人工标注边界框,并结合机器学习辅助手段提升标注效率与一致性,确保为检测模型的训练与评估提供准确可靠的真值(ground truth)。
## 数据集构成
数据集按随机抽样划分为训练集(80%)与测试集(20%):
| 子集 | 图像数量 | 轿车 | 公共汽车 | 卡车 | 摩托车 | 车辆总数 |
|--------|----------|---------|----------|---------|---------|------------|
| 训练集 | 4,335 | 195,539 | 7,030 | 11,779 | 2,963 | 217,311 |
| 测试集 | 1,084 | 49,508 | 1,759 | 3,052 | 805 | 55,124 |
其中5274帧从无人机视频序列中随机抽样得到,另有145帧经过精心挑选,用于覆盖典型困难场景:例如人行横道、自行车道内、交通灯杆附近以及其他易与城市环境融为一体的特殊道路标识周边的摩托车。
## 数据采集
本数据集采集自2022年10月4日至7日在韩国松岛开展的多无人机合作实验,由韩国科学技术院(KAIST)与瑞士联邦理工学院洛桑分校(EPFL)联合实施。
10架无人机组成的机队监控了20个繁忙路口,通过预设高级飞行计划优化覆盖范围。采集的4K(3840×2160)RGB视频帧率为29.97 FPS,飞行高度介于140至150米之间。每架无人机每日执行10次飞行任务,覆盖早高峰与晚高峰时段。本次实验累计采集了12TB的4K原始视频数据。
更多实验设置与数据处理流程的细节可参见文献[1]。
## 边界框标注与格式规范
标注采用Azure ML Studio中的半自动目标检测标注流程,结合机器学习辅助的边界框检测与人工验证,确保标注精度。
每个标注帧包含分类后的轴对齐边界框,以三种通用格式存储:
1. **COCO JSON格式**
每个数据集子集对应一个标注文件(即训练集与测试集各一个),包含图像尺寸、边界框坐标与类别标签等元数据。示例片段如下:
json
{
"images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
"categories": [
{"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
{"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
]
}
2. **YOLO TXT格式**
每张图像对应一个标注文件,格式如下:
边界框坐标归一化至[0,1]区间,原点位于图像左上角。示例片段如下:
txt
0 0.52 0.63 0.10 0.05 # 轿车边界框
2 0.25 0.40 0.15 0.08 # 卡车边界框
3. **Pascal VOC XML格式**
每张图像对应一个XML格式的标注文件,包含图像属性与每个边界框的绝对像素坐标。示例片段如下:
xml
<annotation>
<filename>0001.jpg</filename>
<size>
<width>3840</width>
<height>2160</height>
<depth>3</depth>
</size>
<object>
<name>car</name>
<bndbox>
<xmin>500</xmin>
<ymin>600</ymin>
<xmax>600</xmax>
<ymax>650</ymax>
</bndbox>
</object>
</annotation>
## 文件结构
本数据集以两个压缩包形式提供:
1. **训练数据(train.zip,12.91 GB)**
train/
│── coco_annotations.json # COCO格式标注文件
│── images/
│ ├── 0001.jpg
│ ├── ...
│── labels/
│ ├── 0001.txt # YOLO格式标注文件
│ ├── 0001.xml # Pascal VOC格式标注文件
│ ├── ...
2. **测试数据(test.zip,3.22 GB)**
test/
│── coco_annotations.json
│── images/
│ ├── 00027.jpg
│ ├── ...
│── labels/
│ ├── 00027.txt
│ ├── 00027.xml
│ ├── ...
### 额外文件
- README.md – 数据集说明文档(即本文档)
- LICENSE.txt – 知识共享署名4.0(Creative Commons Attribution 4.0)许可协议
- names.txt – 类别名称列表(每行一个类别)
- data.yaml – 用于训练/测试的YOLO配置示例文件
## 引用与归属
### 优先引用格式
若您出于任何用途使用松岛视觉数据集,包括学术研究、商业应用、开源项目或基准测试,请引用以下配套手稿[1]:
Robert Fonod、Haechan Cho、Hwasoo Yeo、Nikolas Geroliminis(2025)。《用于从无人机影像中提取地理参考车辆轨迹的先进计算机视觉技术》,arXiv预印本,arXiv:2411.02136。(注:本文稿将在正式出版后替换为最终版本。)
> 注:尽管Zenodo会自动生成正式的数据集引用格式(如下所示),我们仍恳请您以本手稿作为本数据集的主要引用来源。
### 数据集存档引用(用于归档用途)
Robert Fonod、Haechan Cho、Hwasoo Yeo、Nikolas Geroliminis(2025)。《松岛视觉:智能城市高空鸟瞰无人机影像车辆标注(v1)》,Zenodo。DOI: 10.5281/zenodo.13828408。
创建时间:
2025-03-17



