thirdeyelabs/indian-road-dataset
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/thirdeyelabs/indian-road-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- object-detection
- image-segmentation
task_ids:
- vehicle-detection
tags:
- autonomous-driving
- indian-roads
- dashcam
- bdd100k
- computer-vision
- detection
pretty_name: Indian Road Driving Dataset
size_categories:
- 100K<n<1M
---
# 🚗 Indian Road Driving Dataset
The **Indian Road Driving Dataset** is the largest open dataset of annotated Indian road footage, created by ThirdEye Labs. It addresses the critical gap in autonomous driving datasets for Indian road conditions.
---
## 🌍 Why Indian Roads?
Indian roads present unique challenges absent from existing datasets (BDD100K, nuScenes, Waymo):
- Dense mixed traffic with unpredictable behavior
- Auto-rickshaws, cattle, and informal lane usage
- Extreme lighting conditions
- **63 million vehicles and 1.4 billion people** — yet no large-scale annotated dataset existed
---
## 📊 Dataset Statistics
| Metric | Value |
|--------|-------|
| **Total clips** | 8,441 |
| **Annotated frames** | 646,014 |
| **Object detections** | 6,896,202 |
| **Segmentation masks** | 1,290,463 |
| **GPS-tagged frames** | ✅ |
| **Annotation format** | BDD100K |
| **Capture device** | CP Plus dashcam |
| **Location** | Delhi NCR, India |
| **Conditions** | Day · Night · Dusk · Rain |
---
## 🏷️ Detection Classes (12 classes)
- **person** — Pedestrians
- **rider** — Motorcyclists/cyclists with rider
- **car** — Passenger cars
- **truck** — Trucks and tempos
- **bus** — Buses
- **motorcycle** — Motorcycles (unridden)
- **bicycle** — Bicycles
- **autorickshaw** — Auto-rickshaws (tuk-tuks)
- **animal** — Cattle, dogs, animals on road
- **vehicle fallback** — Unclassified vehicles
- **traffic light** — Traffic signals
- **traffic sign** — Road signs and boards
---
## 📁 Dataset Structure
Data is stored as **646 WebDataset tar shards** (`data/train-00000-of-00646.tar` … `data/train-00645-of-00646.tar`), each containing ~1,000 frames. Each frame has 3 files inside the shard:
```
{clip_id}_{frame:04d}.jpg # keyframe image
{clip_id}_{frame:04d}.png # segmentation mask
{clip_id}_{frame:04d}.json # BDD100K annotations (detections + scene attributes)
```
Standalone annotation files are also provided for convenient bulk access:
```
annotations/
├── detection.json # BDD100K format — all 646,014 frames (1.3 GB)
└── scene_attributes.json # per-clip weather, time of day, scene type
gps/
└── gps_tracks.json # GPS coordinates per clip
```
---
## 🚀 Quick Start
### Load with 🤗 Datasets
```python
from datasets import load_dataset
ds = load_dataset("thirdeyelabs/indian-road-dataset")
sample = ds["train"][0]
# sample keys: jpg, png, json
```
### Load annotations directly
```python
import json
with open("annotations/detection.json") as f:
annotations = json.load(f)
# BDD100K format — each entry:
# { "name": "clip_id/frame", "labels": [{ "category": "car", "box2d": {...} }] }
```
### Download with CLI
```bash
huggingface-cli download thirdeyelabs/indian-road-dataset --repo-type dataset
```
---
## 📐 Annotation Format (BDD100K Schema)
```json
{
"name": "clip_abc123/0042.jpg",
"timestamp": 1000,
"attributes": {
"weather": "clear",
"scene": "city street",
"timeofday": "daytime"
},
"labels": [
{
"id": 1,
"category": "car",
"box2d": { "x1": 296.0, "y1": 242.0, "x2": 477.0, "y2": 379.0 },
"attributes": { "occluded": false, "truncated": false },
"track_id": 7
}
]
}
```
---
## 🗺️ GPS Coverage
Every clip includes GPS coordinates, enabling:
- Geographic filtering by route/area
- Speed and trajectory analysis
- Map-based dataset exploration
---
## 🏗️ Production Pipeline
ThirdEye Labs end-to-end ML annotation system:
1. **Ingest** — raw MP4s from CP Plus dashcams to S3
2. **Keyframe extraction** — 1 frame/second via FFmpeg
3. **GPS parsing** — matched from `.srt` files
4. **Object detection** — custom YOLO fine-tuned for Indian roads
5. **Semantic segmentation** — SegFormer for drivable areas
6. **Multi-object tracking** — ByteTrack across frames
7. **Scene classification** — weather, lighting, scene type
---
## 📜 License
**Creative Commons Attribution 4.0 International (CC BY 4.0)**
Free to use, share, and adapt for any purpose (including commercial) with attribution to **ThirdEye Labs**.
---
## 📚 Citation
```bibtex
@dataset{thirdeyelabs2026indianroad,
title = {Indian Road Driving Dataset},
author = {ThirdEye Labs},
year = {2026},
url = {https://huggingface.co/datasets/thirdeyelabs/indian-road-dataset},
note = {Released under CC BY 4.0}
}
```
---
## 🔗 Links
- 🌐 **Website**: [thirdeyelabs.ai](https://thirdeyelabs.ai)
- 🎬 **Demo**: [thirdeyelabs.ai/demo](https://thirdeyelabs.ai/demo)
- 📧 **Contact**: [thirdeyelabs.ai/contact](https://thirdeyelabs.ai/contact)
---
*Built with ❤️ in India*
提供机构:
thirdeyelabs



