five

ngqtrung/full-modality-video-caption

收藏
Hugging Face2025-10-21 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/ngqtrung/full-modality-video-caption
下载链接
链接失效反馈
官方服务:
资源简介:
Full Modality Video Caption Dataset是一个大规模的多模态视频数据集,包含视觉、音频和综合描述。该数据集共有55,940个视频段,每个10秒钟。视频段包括三种类型的描述:视觉描述(由GPT-4o生成)、音频描述(由Qwen3-Omni-30B-A3B-Captioner生成)和综合描述(由Qwen3-Omni-30B-A3B-Instruct生成)。数据集以WebDataset格式提供,包含视频文件和JSON格式的元数据。

The Full Modality Video Caption Dataset is a large-scale multimodal video dataset that includes comprehensive vision, audio, and integrated captions. It contains 55,940 video segments, each 10 seconds long, with three types of captions: vision captions (generated by GPT-4o), audio captions (generated by Qwen3-Omni-30B-A3B-Captioner), and video captions (an integrated multi-modal description generated by Qwen3-Omni-30B-A3B-Instruct). The dataset is provided in the WebDataset format, including video files and metadata in JSON format.
提供机构:
ngqtrung
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作