five

Vision

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/S39DQU
下载链接
链接失效反馈
官方服务:
资源简介:
VISION Dataset VISION Dataset VISION (Vehicle Identification and Surveillance through Interactive Natural language) is a benchmark dataset designed for natural language-based vehicle retrieval in real-world surveillance environments. Why VISION? Traditional vehicle retrieval models rely heavily on preprocessed representations and auxiliary tools, which limits their applicability in real-world surveillance systems. VISION enables retrieval directly from raw surveillance video using only a multimodal model, without complex preprocessing pipelines. Key Features ~7,000 vehicle clips, 967,705 frames Collected from the United States, South Korea, and Indonesia Rich, fine-grained natural language annotations Context-aware descriptions including vehicle motion and interactions Greater diversity in road types, weather, and environments Limitations of Previous Datasets The previous benchmark, CityFlow-NL, suffered from: Annotation inconsistencies and errors Overly simplistic descriptions (e.g., “a black sedan going straight”) Lack of diversity in data (limited to daytime, single country) Contribution VISION provides a strong foundation for building robust, generalizable retrieval models suitable for complex urban environments and real-time surveillance systems. © 2025 VISION Dataset Team | For research use only
创建时间:
2025-05-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作