ActivityNet Entities
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/facebookresearch/ActivityNet-Entities
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ActivityNet-Entities,它收集了具有边界框注释的名词短语,这些注释是在帧区域级别进行的,定义了一个4维元组(V, D, E, B),分别代表视频数量、描述、对象和边界框的数量。该数据集用于为RGL方法提供基准测试,并包含了视频字幕任务的详细注释。在规模上,它包含了训练集的10,000个视频、35,000个描述、432个对象和105,000个边界框;验证集的2,500个视频、8,600个描述、427个对象和26,500个边界框;以及测试集的2,500个视频、8,500个描述、421个对象和26,100个边界框。该数据集的任务是生成基于实体的视频描述。
The dataset is named ActivityNet-Entities. It collects noun phrases with bounding box annotations at the frame-region level, defining a 4-dimensional tuple (V, D, E, B), which respectively represents the number of videos, descriptions, objects, and bounding boxes. This dataset serves as a benchmark for the RGL method and contains detailed annotations for video captioning tasks. In terms of scale, the training set contains 10,000 videos, 35,000 descriptions, 432 objects, and 105,000 bounding boxes; the validation set contains 2,500 videos, 8,600 descriptions, 427 objects, and 26,500 bounding boxes; and the test set contains 2,500 videos, 8,500 descriptions, 421 objects, and 26,100 bounding boxes. The task of this dataset is to generate entity-based video captions.



