TopoCoT
收藏魔搭社区2026-04-25 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/yimingyang23/TopoCoT
下载链接
链接失效反馈官方服务:
资源简介:
# TopoCoT Dataset
> **WACV 2026 Workshop**: Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design (TopoCoT)
> **Challenge & Submission**: [Pages](https://huggingface.co/spaces/zhanchao019/test_server_wacv)
> **Baseline Implementation**: [GitHub Repository](https://github.com/TopoCoTWACV26/TopoCoT_code) (We are continuously updating the baseline)
## 📚 Table of Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
- [Documentation Index](#documentation-index)
- [Dataset Structure](#dataset-structure)
- [Data Organization](#data-organization)
- [File Descriptions](#file-descriptions)
- [Coordinate Systems](#coordinate-systems)
- [Annotation Format](#annotation-format)
- [Training and Inference](#training-and-inference)
- [Citation](#citation)
- [Contact](#contact)
---
## Overview
The **TopoCoT Dataset** is built upon the OpenLaneV2 framework and features **Chain-of-Thought (CoT) annotations** for enhanced lane topology understanding and HD map generation in autonomous driving scenarios.
**Key Features**:
- 🚗 **700 training scenes** (~20,000 frames) with full annotations
- 🎯 **50 challenging test scenes** for evaluation
- 🧠 **Chain-of-Thought reasoning** annotations for explainable decision-making
- 📷 **7 surround-view cameras** per frame (360° coverage)
- 🗺️ **Lane topology** with centerlines and connectivity graphs
---
## Quick Start
1. **Download the dataset** from ModelScope
2. **Extract the archives**: `TopoCoT_train_part01.tar.gz` to `TopoCoT_train_part10.tar.gz` (training), `TopoCoT_val_part01.tar.gz` (test)
3. **Read the documentation**:
- Start with this README for dataset structure
- See [ANNOTATION_CONSTRUCTION_README.md](ANNOTATION_CONSTRUCTION_README.md) to understand how annotations are created
- See [COT_GENERATION_README.md](COT_GENERATION_README.md) to learn about Chain-of-Thought generation
- See [EVALUATION_README.md](EVALUATION_README.md) for submission format and evaluation metrics
4. **Get the baseline code**: Clone the [baseline repository](https://github.com/TopoCoTWACV26/TopoCoT_code) for training and inference code
5. **Train your model** using the baseline or your custom VLMs
6. **Submit predictions** in the required JSON format (see [EVALUATION_README.md](EVALUATION_README.md))
---
## Documentation Index
This dataset comes with comprehensive documentation to help you understand the data and participate in the workshop:
| Document | Description |
|----------|-------------|
| **[README.md](README.md)** (this file) | Main documentation covering dataset structure, file formats, coordinate systems, and training guidelines |
| **[ANNOTATION_CONSTRUCTION_README.md](ANNOTATION_CONSTRUCTION_README.md)** | Detailed methodology for constructing geometric annotations, including:<br>• Future waypoint extraction<br>• Driving direction determination<br>• Ego lane identification<br>• Lane topology reasoning |
| **[COT_GENERATION_README.md](COT_GENERATION_README.md)** | Explanation of Chain-of-Thought generation using VLMs:<br>• Why UV coordinate system is essential<br>• Three-stage dialogue design<br>• Multi-modal integration strategy |
| **[EVALUATION_README.md](EVALUATION_README.md)** | Complete evaluation guide for participants:<br>• Required JSON submission format<br>• Evaluation metrics (Chamfer Distance, F-Score, Topology metrics)<br>• Coordinate system requirements<br>• Example prediction files |
---
## Dataset Structure
```
TopoCoT/
├── data
│ ├── data_dict_subset_A_train_lanesegnet.pkl
│ ├── data_dict_subset_A_val_lanesegnet.pkl
│ ├── Trainset/
│ │ ├── 00000/
│ │ │ ├── 315967376899927209/
│ │ │ │ ├── ring_front_center.jpg
│ │ │ │ ├── ring_front_left.jpg
│ │ │ │ ├── ring_front_right.jpg
│ │ │ │ ├── ring_rear_left.jpg
│ │ │ │ ├── ring_rear_right.jpg
│ │ │ │ ├── ring_side_left.jpg
│ │ │ │ ├── ring_side_right.jpg
│ │ │ │ ├── lane_with_drive.json
│ │ │ │ ├── lane_with_drive_bev.json
│ │ │ │ └── TopoCoT.json
│ │ │ └── ... (more timestamps)
│ │ ├── 00001/
│ │ └── ... (scenes 00000 to 00699)
│ └── Testset/
│ │ ├── 10000/
│ │ │ ├── 315967933449927213/
│ │ │ │ ├── ring_front_center.jpg
│ │ │ │ ├── ring_front_left.jpg
│ │ │ │ ├── ring_front_right.jpg
│ │ │ │ ├── ring_rear_left.jpg
│ │ │ │ ├── ring_rear_right.jpg
│ │ │ │ ├── ring_side_left.jpg
│ │ │ │ └── ring_side_right.jpg
│ │ │ └── ... (more timestamps)
│ │ └── ... (50 challenging scenes)
```
---
## Data Organization
### Training Set
- **Archive Files**: `TopoCoT_train_part01.tar.gz` to `TopoCoT_train_part10.tar.gz`
- **Scene IDs**: 00000 to 00699 (700 scenes total)
- **Organization**: Each scene directory contains multiple timestamp-named subdirectories representing sequential video frames
### Test Set
- **Archive File**: `TopoCoT_val_part01.tar.gz`
- **Scenes**: 50 carefully selected challenging scenarios
- **Organization**: Similar structure to training set
---
## File Descriptions
### Per-Frame Files
Each timestamp directory contains the following files:
#### 1. **Surround-View Images** (7 images)
- `ring_front_center.jpg` - Front center camera
- `ring_front_left.jpg` - Front left camera
- `ring_front_right.jpg` - Front right camera
- `ring_rear_left.jpg` - Rear left camera
- `ring_rear_right.jpg` - Rear right camera
- `ring_side_left.jpg` - Left side camera
- `ring_side_right.jpg` - Right side camera
#### 2. **TopoCoT.json**
Chain-of-Thought (CoT) annotation file containing reasoning steps for lane topology understanding and navigation decision-making.
**Important**: This file uses the **UV/OpenCV coordinate system**. When using `TopoCoT.json`, you must use the corresponding `lane_with_drive.json` annotations to ensure coordinate system consistency.
#### 3. **lane_with_drive.json**
Lane segment and trajectory annotations in **UV/OpenCV coordinate system**:
- **Origin**: (250, 500)
- **X-axis**: Rightward (positive direction)
- **Y-axis**: Downward (positive direction)
- Contains lane segments, trajectory waypoints, and driving commands
#### 4. **lane_with_drive_bev.json**
Lane segment and trajectory annotations in **BEV (Bird's Eye View) coordinate system** following OpenLaneV2 convention:
- **Origin**: (0, 0)
- **X-axis**: Forward (positive direction)
- **Y-axis**: Leftward (positive direction)
- Same semantic content as `lane_with_drive.json` but in different coordinate frame
**Warning**: This file uses a different coordinate system than `TopoCoT.json`. If you use `lane_with_drive_bev.json`, there will be a **coordinate system conflict** with `TopoCoT.json`. For CoT-based tasks, always use `lane_with_drive.json` instead.
### Dataset-Level Files
#### **data_dict_subset_A_train_lanesegnet.pkl**
Python pickle file containing training set metadata including:
- Camera intrinsics and extrinsics
- Vehicle pose information
- Sensor configurations
- Scene metadata
#### **data_dict_subset_A_val_lanesegnet.pkl**
Python pickle file containing test set metadata with similar structure as training set.
---
## Coordinate Systems
### UV/OpenCV Coordinate System
Used in `lane_with_drive.json`:
```
u (x-axis) →
v ┌──────────────────┐
(y)│ │
↓ │ Vehicle at │
│ (250, 500) │
│ │
└──────────────────┘
```
- **Origin**: Top-left corner (vehicle at (250, 500))
- **Range**: u ∈ [0, 500], v ∈ [0, 1000]
### BEV Coordinate System
Used in `lane_with_drive_bev.json`:
```
↑ x (forward)
│
│
y (left)──────┼(0,0) Vehicle──────
│
```
- **Origin**: Vehicle position (0, 0)
- **X-range**: [-50m, 50m]
- **Y-range**: [-25m, 25m]
---
## Coordinate Conversion
### BEV to UV Transformation
The conversion from BEV coordinates to UV coordinates is defined as:
```python
import numpy as np
def bev_to_opencv(points_xy, W=500, H=1000, Z=40,
x_min=-50, x_max=50,
y_min=-25, y_max=25,
z_min=-2.3, z_max=17):
"""
Convert BEV coordinates to OpenCV UV coordinates.
Args:
points_xy: numpy array of shape (N, 3) with [x, y, z] in BEV frame
W: Image width (default: 500)
H: Image height (default: 1000)
Z: Z-axis scale (default: 40)
x_min, x_max: BEV X-axis range in meters
y_min, y_max: BEV Y-axis range in meters
z_min, z_max: BEV Z-axis range in meters
Returns:
numpy array of shape (N, 3) with [u, v, z] in UV coordinates
"""
pts = np.asarray(points_xy, dtype=np.float32)
x, y, z = pts[:, 0], -pts[:, 1], pts[:, 2]
u = (y - y_min) / (y_max - y_min) * W
v = (x_max - x) / (x_max - x_min) * H
z = (z - z_min) / (z_max - z_min) * Z
return np.stack([u, v, z], axis=1).astype(int)
```
---
# IMPORTANT: When using TopoCoT.json, always use lane_with_drive.json (UV coordinates)
# Using lane_with_drive_bev.json will cause coordinate system conflicts with TopoCoT.json
```
---
## Annotation Format
### lane_with_drive.json Structure
```json
{
"lanes": [
{
"center_id": "LANE0",
"category": "centerline",
"left_boundary": "solid",
"right_boundary": "dashed",
"offset": 35,
"point_2d": [[u0, v0], [u1, v1], ...]
}
],
"topology": [
["LANE0", "LANE1", "LANE2"],
["LANE3", "LANE4"]
],
"ego": {
"current_lane": "LANE0",
"drive_direction": "Straight",
"downstream_lanes": ["LANE1", "LANE2"],
"downstream_directions": ["Straight", "Turn Left"]
},
"navigation": {
"command": "Straight"
},
"future_waypoints": [[u0, v0], [u1, v1], ...]
}
```
#### Field Descriptions
**Primary Evaluation Targets:**
- **`point_2d`**: The 2D coordinates of each lane segment in UV coordinate system. During testing, the accuracy of lane segment coordinate regression is evaluated based on this field.
- **`topology`**: The topological connections between lane segments. Testing evaluates whether the predicted topology structure matches the ground truth.
**Auxiliary Semantic Information:**
- **`offset`**: Represents the distance from the centerline to both left and right boundaries (lane half-width). This provides geometric context for lane width. Left and right boundaries can be reconstructed from the centerline and offset using the method described below.
- **`left_boundary` / `right_boundary`**: Boundary line types (`solid`, `dashed`, `invisible`), providing lane change feasibility information.
- **`category`**: Lane type (`centerline` for drivable lanes, `ped_crossing` for pedestrian crossings).
### Reconstructing Lane Boundaries from Centerline and Offset
Left and right lane boundaries can be reconstructed from the centerline coordinates and offset value. This is useful for visualization or detailed lane geometry analysis, though **boundary reconstruction is not required for evaluation**.
```python
import numpy as np
def reconstruct_boundaries(centerline, offset):
"""
Reconstruct left and right boundaries from centerline and offset.
Args:
centerline: numpy array of shape (N, 2) or (N, 3) representing centerline points
offset: scalar value representing half-width of the lane
Returns:
left_boundary: numpy array of shape (N, 2) or (N, 3)
right_boundary: numpy array of shape (N, 2) or (N, 3)
"""
# Calculate overall direction to determine left/right orientation
whole_direction = centerline[-1] - centerline[0]
whole_direction = whole_direction / np.linalg.norm(whole_direction)
# Calculate orthogonal direction (perpendicular to overall direction)
whole_orthogonal_direction = np.cross(whole_direction, np.array([0, 0, 1]))
if np.dot(whole_orthogonal_direction, np.array([0, 1, 0])) < 0:
whole_orthogonal_direction = -whole_orthogonal_direction
whole_orthogonal_direction = whole_orthogonal_direction / np.linalg.norm(whole_orthogonal_direction)
left_boundary = []
right_boundary = []
# Calculate boundaries for each segment
for i in range(len(centerline) - 1):
direction = centerline[i+1] - centerline[i]
direction = direction / np.linalg.norm(direction)
# Calculate orthogonal direction for this segment
orthogonal_direction = np.cross(direction, np.array([0, 0, 1]))
# Ensure consistent orientation with overall direction
if np.dot(orthogonal_direction, whole_orthogonal_direction) < 0:
orthogonal_direction = -orthogonal_direction
orthogonal_direction = orthogonal_direction / np.linalg.norm(orthogonal_direction)
# Calculate boundary points
left_boundary.append(centerline[i] + orthogonal_direction * offset)
right_boundary.append(centerline[i] - orthogonal_direction * offset)
# Add last point
left_boundary.append(centerline[-1] + orthogonal_direction * offset)
right_boundary.append(centerline[-1] - orthogonal_direction * offset)
left_boundary = np.array(left_boundary)
right_boundary = np.array(right_boundary)
return left_boundary, right_boundary
```
**Note**: Evaluation focuses **only on centerlines** (`point`) and topology. Left and right boundary reconstruction is not required for evaluation.
**Ego Vehicle Driving Context (`ego`):**
The `ego` field provides crucial information for autonomous driving decision-making based on the "follow your lane to drive" principle:
- **`current_lane`**: The centerline ID where the ego vehicle is currently located
- **`drive_direction`**: The driving direction of the current lane.
- **`downstream_lanes`**: List of downstream centerline IDs connected from the current lane
- **`downstream_directions`**: Driving directions of each downstream lane, providing reference for future trajectory planning
These ego-related fields help the vehicle understand its current position in the road network and make informed driving decisions.
**Navigation and Planning:**
- **`navigation.command`**: High-level navigation command for the ego vehicle
- **`future_waypoints`**: Future trajectory waypoints, representing the planned path
**Chain-of-Thought Reasoning:**
The accompanying `TopoCoT.json` file provides comprehensive Chain-of-Thought reasoning that includes:
- Detailed scene understanding and analysis
- Road topology interpretation and reasoning steps
- Driving suggestions and decision-making rationale for the ego vehicle
- Step-by-step reasoning process connecting perception, topology understanding, and driving actions
This CoT information enables explainable autonomous driving decisions by linking visual perception to high-level driving strategies.
---
## Training and Inference
### Baseline Implementation
We provide a baseline implementation for TopoCoT, which includes:
- Complete training pipeline with three-stage training (BEVFormer pre-training, adapter training, and LoRA fine-tuning)
- Inference and evaluation scripts
- Pre-trained model weights
**Baseline Repository**: [TopoCoT_code](https://github.com/TopoCoTWACV26/TopoCoT_code)
Participants are encouraged to use the baseline as a starting point and explore improvements using state-of-the-art vision-language models and training frameworks.
### Recommended Tools and Frameworks
**Vision-Language Models**:
Participants can choose from existing open-source VLMs, such as:
- **Qwen-VL** series (e.g., Qwen2-VL, Qwen3-VL): Multi-modal models with strong vision-language understanding capabilities
- **LLaVA** series: Instruction-following vision-language models
- **InternVL**: Large-scale vision-language foundation models
- Or your custom VLMs.
### Dataset Usage
The dataset provides:
- **Training Set**: 700 scenes (~20,000 frames) with full annotations including:
- Surround-view camera images
- Lane geometry annotations (`lane_with_drive.json` in UV coordinates)
- Chain-of-Thought reasoning (`TopoCoT.json`)
- **Test Set**: 50 challenging scenes with only:
- Surround-view camera images
- No ground-truth annotations (for evaluation purposes)
### Evaluation
Participants should develop VLM models that can:
1. **Perceive** the driving scene from surround-view images
2. **Reason** about lane topology and spatial relationships
3. **Predict** lane centerlines (`point`) and topology connections
Evaluation metrics will focus on:
- Lane detection accuracy (centerline coordinates)
- Topology prediction accuracy (connectivity between lanes)
### Expected Output Format
**Important**: Your model must output predictions in a specific JSON format for evaluation. Please refer to **[EVALUATION_README.md](EVALUATION_README.md)** for:
- Complete JSON structure requirements
- Field specifications and data types
- Coordinate system requirements (UV/OpenCV only)
- Example prediction files
**If your model's output format differs from the required format**, you are responsible for implementing a conversion script to transform your model's output into the expected JSON structure before submission.
Key requirements:
- Each prediction must include `segment_id`, `timestamp`, and a JSON string containing `lanes` and `topology`
- All `point` coordinates must be in **UV/OpenCV coordinate system** (not BEV)
- Each lane must have exactly **10 points**
- Topology must reference lane IDs that exist in the `lanes` array
See `./example_pred_json/eval_result_examples.json` for a complete example.
---
## Citation
If you use this dataset, please cite:
```bibtex
@article{wang2023openlane,
title={Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping},
author={Wang, Huijie and Li, Tianyu and Li, Yang and Chen, Li and Sima, Chonghao and Liu, Zhenbo and Wang, Bangjun and Jia, Peijin and Wang, Yuting and Jiang, Shengyin and others},
journal={Advances in Neural Information Processing Systems},
volume={36},
pages={18873--18884},
year={2023}
}
@article{li2023lanesegnet,
title={Lanesegnet: Map learning with lane segment perception for autonomous driving},
author={Li, Tianyu and Jia, Peijin and Wang, Bangjun and Chen, Li and Jiang, Kun and Yan, Junchi and Li, Hongyang},
journal={arXiv preprint arXiv:2312.16108},
year={2023}
}
@article{yang2025topo2seq,
title={Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning},
author={Yang, Yiming and Luo, Yueru and He, Bingkun and Li, Erlong and Cao, Zhipeng and Zheng, Chao and Mei, Shuqi and Li, Zhen},
journal={AAAI 2025},
year={2025}
}
@article{yang2025topostreamer,
title={TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving},
author={Yang, Yiming and Luo, Yueru and He, Bingkun and Lin, Hongbin and Fu, Suzhong and Zheng, Chao and Cao, Zhipeng and Li, Erlong and Yan, Chao and Cui, Shuguang and others},
journal={arXiv preprint arXiv:2507.00709},
year={2025}
}
@article{yang2025fastopowm,
title={FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models},
author={Yang, Yiming and Lin, Hongbin and Luo, Yueru and Fu, Suzhong and Zheng, Chao and Yan, Xinrui and Mei, Shuqi and Tang, Kun and Cui, Shuguang and Li, Zhen},
journal={arXiv preprint arXiv:2507.23325},
year={2025}
}
@article{luo2025reltopo,
title={RelTopo: Enhancing Relational Modeling for Driving Scene Topology Reasoning},
author={Luo, Yueru and Zhou, Changqing and Yang, Yiming and Li, Erlong and Zheng, Chao and Mei, Shuqi and Cui, Shuguang and Li, Zhen},
journal={arXiv preprint arXiv:2506.13553},
year={2025}
}
```
---
## Contact
For questions or issues regarding the dataset:
- Open an issue in the repository
- Contact the workshop organizers
- Visit the WACV 2026 Workshop website
---
提供机构:
maas
创建时间:
2025-12-16



