Vision

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://doi.org/10.7910/DVN/S39DQU

下载链接

链接失效反馈

官方服务：

资源简介：

VISION Dataset VISION Dataset VISION (Vehicle Identification and Surveillance through Interactive Natural language) is a benchmark dataset designed for natural language-based vehicle retrieval in real-world surveillance environments. Why VISION? Traditional vehicle retrieval models rely heavily on preprocessed representations and auxiliary tools, which limits their applicability in real-world surveillance systems. VISION enables retrieval directly from raw surveillance video using only a multimodal model, without complex preprocessing pipelines. Key Features ~7,000 vehicle clips, 967,705 frames Collected from the United States, South Korea, and Indonesia Rich, fine-grained natural language annotations Context-aware descriptions including vehicle motion and interactions Greater diversity in road types, weather, and environments Limitations of Previous Datasets The previous benchmark, CityFlow-NL, suffered from: Annotation inconsistencies and errors Overly simplistic descriptions (e.g., “a black sedan going straight”) Lack of diversity in data (limited to daytime, single country) Contribution VISION provides a strong foundation for building robust, generalizable retrieval models suitable for complex urban environments and real-time surveillance systems. © 2025 VISION Dataset Team | For research use only

创建时间：

2025-05-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集