IndoTraffic: Indonesian Mixed Traffic Vehicle Detection Dataset
收藏Zenodo2025-12-15 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17722304
下载链接
链接失效反馈官方服务:
资源简介:
IndoTraffic is a comprehensive computer vision dataset designed for vehicle detection and traffic monitoring in Indonesian mixed traffic conditions. The dataset addresses critical gaps in ground-based traffic surveillance datasets for Southeast Asian contexts, where motorcycle-dominated traffic patterns and diverse environmental conditions present unique challenges for automated detection systems.
Dataset Overview
- Total Images: 4,126 high-resolution images (1920×1080 pixels)- Total Annotations: 50,710 bounding box annotations- Geographic Coverage: 3 major Indonesian cities (Denpasar, Yogyakarta, Greater Jakarta)- Vehicle Classes: 6 classes (Motorcycles, Cars, Trucks, Buses, Pedestrians, Unmotorized Vehicles)- Temporal Coverage: 24-hour sampling across 5 time periods (morning, midday, evening, night, dusk/dawn)- Data Splits: Pre-defined train (70%), validation (20%), test (10%) splits with stratification- Formats: YOLO format (txt) and COCO format (JSON) annotations
Key Features
1. Multi-City Coverage: Three Indonesian urban centers representing diverse traffic characteristics (tourism hub, education city, megacity)2. Rich Metadata: Comprehensive spatio-temporal metadata including timestamp, city, lighting conditions, and time period classifications3. Extreme Class Imbalance: Realistic motorcycle-dominated distribution (62.6% motorcycles) reflecting actual Southeast Asian traffic composition4. 24-Hour Temporal Sampling: Complete diurnal cycle coverage including challenging dusk/dawn transition periods5. Indonesian Context: Vehicle classes aligned with Indonesian Road Capacity Manual (PKJI 2023) standards
Baseline Performance
YOLOv8m baseline model achieves:- Overall mAP@0.5: 76.0% on test set- Per-class performance: Ranges from 97.6% (buses) to 66.5% (motorcycles)- Temporal variation: Cohen's d = 0.77 (medium-large effect) between daytime and nighttime- Novel finding: Motorcycle detection paradox - highest class frequency (62.6%) does not correlate with detection performance (66.5% AP)
Research Applications
This dataset enables research in:- Vehicle detection algorithms for mixed traffic- Small object detection in dense scenes- Temporal performance analysis- Geographic domain adaptation- Class imbalance mitigation strategies- Real-world traffic monitoring systems
Data Collection
Data collected from Automated Traffic Counting System (ATCS) cameras at strategic intersections between May-October 2024. Manual annotation performed using Roboflow Annotate with three-stage quality control verification. All annotations follow standardized protocols with inter-annotator agreement checks.
License and Citation
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Citation:If you use this dataset in your research, please cite:
Suartawan, E., et al. (2024). IndoTraffic: A Large-Scale Dataset for Vehicle Detection in Indonesian Mixed Traffic Conditions. Scientific Data (under review). DOI: [to be assigned]
Dataset DOI: 10.5281/zenodo.[will be assigned upon upload]
Quality Assurance
- Three-stage verification protocol- Inter-annotator agreement validation- Stratified splitting with fixed random seed (42) for reproducibility- Comprehensive documentation and usage examples- Known limitations transparently documented
DATASET CHARACTERISTICS:
Geographic Diversity:- Denpasar (Bali): Tourism-focused city, moderate traffic density- Yogyakarta (Central Java): Education hub, student-heavy traffic- Jabodetabek (Greater Jakarta): Megacity, extreme density
Temporal Coverage:- Morning Peak: 06:00-10:00 (41% of dataset)- Midday: 10:00-14:00 (20%)- Evening Peak: 14:00-18:00 (16%)- Night: 18:00-06:00 (11%)- Dusk/Dawn Transition: 05:00-07:00, 17:00-19:00 (12%)
Class Distribution:- Motorcycles: 31,736 (62.6%)- Cars: 14,895 (29.4%)- Trucks: 2,301 (4.5%)- Buses: 1,471 (2.9%)- Pedestrians: 210 (0.4%)- Unmotorized: 97 (0.2%)
Known Limitations:1. Class imbalance (632:1 ratio between motorcycles and unmotorized)2. Minority classes have limited samples (pedestrians n=210, unmotorized n=97)3. Static camera perspective (no moving vehicle footage)4. Single camera angle per location5. Geographic scope limited to Java and Bali islands6. Weather metadata not validated
Technical Specifications:- Image Resolution: 1920×1080 pixels- Image Format: JPEG- Annotation Format: YOLO (normalized coordinates) and COCO (absolute coordinates)- Train/Val/Test Split: 70/20/10 (stratified by city and class distribution)- Random Seed: 42 (for reproducibility)
Baseline Model Details:- Architecture: YOLOv8m (medium)- Training: 100 epochs, batch size 16, image size 640×640- Optimizer: AdamW with cosine annealing- Data Augmentation: Mosaic, mixup, HSV augmentation, random flip- Hardware: NVIDIA GPU (training time ~8 hours)
USAGE RECOMMENDATIONS:
For researchers working with this dataset:1. Use provided train/val/test splits for fair comparison2. Report per-class metrics separately (overall mAP can be misleading)3. Consider temporal metadata for domain-specific analysis4. Apply class balancing techniques for minority classes5. Validate motorcycle detection separately due to paradox finding6. Use metadata for stratified sampling in experiments
For production deployment:1. Apply calibration factors for volume estimation (see paper)2. Use time-adaptive confidence thresholds (see Supplementary Information)3. Monitor performance during dusk/dawn periods (known degradation)4. Implement quality control triggers for low-confidence frames5. Plan for periodic model updates (seasonal variations)
QUALITY CONTROL MEASURES:
1. Three-stage annotation verification: - Initial annotation by trained annotator - Random sample review (10% of each batch) - Final consistency check across cities
2. Inter-annotator agreement: - Tested on 100-image subset - IoU threshold: 0.5 - Agreement rate: >95%
3. Data validation: - Duplicate image detection (none found) - Annotation format validation (all passed) - Bounding box sanity checks (no out-of-bounds) - Metadata consistency verification (all validated)
ETHICAL CONSIDERATIONS:
- All images collected from public spaces with appropriate permissions- No personally identifiable information (license plates blurred if visible)- Camera locations chosen to minimize privacy concerns- Dataset use agreement requires ethical research practices- Commercial use permitted but must respect privacy standards
Contact
For questions, issues, or collaboration inquiries:- Author: Eka Suartawan- nstitution: Bali Land Transportation Polytechnic, Indonesia- Email: putu.eka@poltradabali.ac.id
Acknowledgments
This work was conducted independently without institutional or external funding. We thank the traffic management authorities in Denpasar, Yogyakarta, and Jakarta for ATCS camera access.
Keywords: vehicle detection, traffic monitoring, object detection, computer vision dataset, Indonesian traffic, mixed traffic, motorcycle detection, YOLO, deep learning, intelligent transportation systems
Version: 1.0Release Date: November 2025
提供机构:
Zenodo
创建时间:
2025-12-15



