Vision
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/S39DQU
下载链接
链接失效反馈官方服务:
资源简介:
VISION Dataset VISION Dataset VISION (Vehicle Identification and Surveillance through Interactive Natural language) is a benchmark dataset designed for natural language-based vehicle retrieval in real-world surveillance environments. Why VISION? Traditional vehicle retrieval models rely heavily on preprocessed representations and auxiliary tools, which limits their applicability in real-world surveillance systems. VISION enables retrieval directly from raw surveillance video using only a multimodal model, without complex preprocessing pipelines. Key Features ~7,000 vehicle clips, 967,705 frames Collected from the United States, South Korea, and Indonesia Rich, fine-grained natural language annotations Context-aware descriptions including vehicle motion and interactions Greater diversity in road types, weather, and environments Limitations of Previous Datasets The previous benchmark, CityFlow-NL, suffered from: Annotation inconsistencies and errors Overly simplistic descriptions (e.g., “a black sedan going straight”) Lack of diversity in data (limited to daytime, single country) Contribution VISION provides a strong foundation for building robust, generalizable retrieval models suitable for complex urban environments and real-time surveillance systems. © 2025 VISION Dataset Team | For research use only
创建时间:
2025-05-12



