Vision
收藏DataONE2025-05-12 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:247781c6453c455af86b4aa5facc6cf41e8cbfc28da856db89e1e968e01e7735
下载链接
链接失效反馈官方服务:
资源简介:
VISION Dataset VISION Dataset VISION (Vehicle Identification and Surveillance through Interactive Natural language) is a benchmark dataset designed for natural language-based vehicle retrieval in real-world surveillance environments. Why VISION? Traditional vehicle retrieval models rely heavily on preprocessed representations and auxiliary tools, which limits their applicability in real-world surveillance systems. VISION enables retrieval directly from raw surveillance video using only a multimodal model, without complex preprocessing pipelines. Key Features ~7,000 vehicle clips, 967,705 frames Collected from the United States, South Korea, and Indonesia Rich, fine-grained natural language annotations Context-aware descriptions including vehicle motion and interactions Greater diversity in road types, weather, and environments Limitations of Previous Datasets The previous benchmark, CityFlow-NL, suffered from: Annotation inconsistencies and errors Overly simplistic descriptions (e.g., “a black sedan going straight”) Lack of diversity in data (limited to daytime, single country) Contribution VISION provides a strong foundation for building robust, generalizable retrieval models suitable for complex urban environments and real-time surveillance systems. © 2025 VISION Dataset Team | For research use only
创建时间:
2025-10-29



