Vision

DataONE2025-05-12 更新2025-11-01 收录

下载链接：

https://search.dataone.org/view/sha256:247781c6453c455af86b4aa5facc6cf41e8cbfc28da856db89e1e968e01e7735

下载链接

链接失效反馈

官方服务：

资源简介：

VISION Dataset VISION Dataset VISION (Vehicle Identification and Surveillance through Interactive Natural language) is a benchmark dataset designed for natural language-based vehicle retrieval in real-world surveillance environments. Why VISION? Traditional vehicle retrieval models rely heavily on preprocessed representations and auxiliary tools, which limits their applicability in real-world surveillance systems. VISION enables retrieval directly from raw surveillance video using only a multimodal model, without complex preprocessing pipelines. Key Features ~7,000 vehicle clips, 967,705 frames Collected from the United States, South Korea, and Indonesia Rich, fine-grained natural language annotations Context-aware descriptions including vehicle motion and interactions Greater diversity in road types, weather, and environments Limitations of Previous Datasets The previous benchmark, CityFlow-NL, suffered from: Annotation inconsistencies and errors Overly simplistic descriptions (e.g., “a black sedan going straight”) Lack of diversity in data (limited to daytime, single country) Contribution VISION provides a strong foundation for building robust, generalizable retrieval models suitable for complex urban environments and real-time surveillance systems. © 2025 VISION Dataset Team | For research use only

创建时间：

2025-10-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集