five

litwell/MMTrail-20M

收藏
Hugging Face2024-11-07 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/litwell/MMTrail-20M
下载链接
链接失效反馈
官方服务:
资源简介:
MMTrail是一个大规模的多模态视频-语言数据集,包含超过20M的预告片剪辑,具有高质量的视觉帧和背景音乐的多模态字幕。数据集旨在增强跨模态研究和细粒度的多模态语言模型训练。数据集提供了超过2M的LLaVA视频字幕、2M的音乐字幕和60M的Coca帧字幕,涵盖了27.1k小时的预告片视频。数据集的特点包括多样的主题和内容类型,以及定制的背景音乐,使其与视觉内容更加一致。数据集通过先进的LLM技术合并所有注释,确保字幕保留音乐视角的同时保持视觉内容的权威性。

MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music. The dataset aims to enhance cross-modality studies and fine-grained multimodal-language model training. It provides 2M+ LLaVA Video captions, 2M+ Music captions, and 60M+ Coca frame captions for 27.1k hours of trailer videos. The dataset is designed to address the gap in current video-language datasets by providing comprehensive and precise descriptions, including custom-designed background music that is more coherent with the visual context. The dataset is available in JSON format, with columns including videoID, timestamps, generated caption, and several similarity scores. The README also provides metadata format details and instructions for downloading the dataset.
提供机构:
litwell
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作