An image-caption dataset
收藏DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18856454
下载链接
链接失效反馈官方服务:
资源简介:
This is an image–caption dataset comprising 30k images, 210k captions, and 304k bounding box annotations, designed to support Video Analytics Research and Applications. It contains real-world images depicting activities and events relevant to surveillance, public safety, and abnormal behavior detection.
Contents:
30,000 images
210,000 captions (seven captions per image)
304,043 bounding box annotations (multiple bounding boxes per image)
File Formats:
Images: PNG, JPG, JPEG
Bounding Boxes: TXT
Captions: CSV
The dataset is intended for:
Vision–Language Modeling
Real-time Object Detection Model Development
Text-based Image Retrieval for Video Analytics
Multimodal Reasoning
提供机构:
Zenodo
创建时间:
2026-03-04



