TUT-ARG/AVCaps
收藏Hugging Face2025-01-10 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/TUT-ARG/AVCaps
下载链接
链接失效反馈官方服务:
资源简介:
AVCaps数据集是一个包含2061个视频片段,总时长28.8小时的多模态音视频字幕资源。它为音频、视觉和音频视觉联合模态提供字幕,并支持多模态字幕、多模态检索以及视频内容理解任务。数据集分为训练集、验证集和测试集,提供音频字幕、视觉字幕、音频视觉字幕以及GPT-4生成的额外音频视觉字幕。
The AVCaps dataset is a multimodal audio-visual captioning resource containing 2061 video clips, totaling 28.8 hours of content. It provides captions for audio, visual, and audio-visual modalities, supporting tasks such as multimodal captioning, multimodal retrieval, and video content understanding. The dataset is split into training, validation, and testing sets, offering audio captions, visual captions, audio-visual captions, and additional GPT-4 synthesized audio-visual captions.
提供机构:
TUT-ARG



