MMTrail|多模态数据数据集|视频内容理解数据集
收藏MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
数据集概述
MMTrail是一个大规模的多模态视频-语言数据集,包含超过2000万个预告片片段,具有高质量的多模态字幕,整合了上下文、视觉帧和背景音乐,旨在增强跨模态研究和细粒度多模态-语言模型训练。数据集提供了200万个LLaVA视频字幕、200万个音乐字幕和6000万个Coca帧字幕,涵盖27.1万小时的预告片视频。
下载信息
分割 | 下载链接 | 样本数量 | 视频时长 | 存储空间 |
---|---|---|---|---|
Training-Coca | 下载 | 2000万 | 27.1万小时 | 约8.0 TB |
Training | 下载 | 210万 | 8.2万小时 | 约1.6 TB |
Test (2M(sample 1w)) | 下载 | 210万 | 8.2万小时 | 约1.6 TB |
Test | TODO (2.77 MB) | 1000 | 3.5小时 | 794 Mb |
元数据格式
json [ { video_id: zW1-6V_cN8I, video_path: group_32/zW1-6V_cN8I.mp4, video_duration: 1645.52, video_resolution: [720, 1280], video_fps: 25.0, clip_id: zW1-6V_cN8I_0000141, clip_path: video_dataset_32/zW1-6V_cN8I_0000141.mp4, clip_duration: 9.92, clip_start_end_idx: [27102, 27350], image_quality: 45.510545094807945, of_score: 6.993135, aesthetic_score: [4.515582084655762, 4.1147027015686035, 3.796849250793457], music_caption_wo_vocal: [{text: This song features a drum machine playing a simple beat. A siren sound is played on the low register. Then, a synth plays a descending lick and the other voice starts rapping. This is followed by a descending run. The mid range of the instruments cannot be heard. This song can be played in a meditation center., time: 0:00-10:00}], vocal_caption: I was just wondering..., frame_caption: [two people are standing in a room under an umbrella . , a woman in a purple robe standing in front of a man . , a man and a woman dressed in satin robes . ], music_caption: [{text: This music is instrumental. The tempo is medium with a synthesiser arrangement and digital drumming with a lot of vibrato and static. The music is loud, emphatic, youthful, groovy, energetic and pulsating. This music is a Electro Trap., time: 0:00-10:00}], objects: [ bed, Woman, wall, pink robe, pillow], background: Bedroom, ocr_score: 0.0, caption: The video shows a woman in a pink robe standing in a room with a bed and a table, captured in a series of keyframes that show her in various poses and expressions., polish_caption: A woman in a pink robe poses and expresses herself in various ways in a room with a bed and a table, capturing her graceful movements and emotive facial expressions., merge_caption: In a cozy bedroom setting, a stunning woman adorned in a pink robe gracefully poses and expresses herself, her movements and facial expressions captured in a series of intimate moments. The scene is set against the backdrop of a comfortable bed and a table, with an umbrella standing in a corner of the room. The video features two people standing together under the umbrella, a woman in a purple robe standing confidently in front of a man, and a man and woman dressed in satin robes, all set to an energetic and pulsating electro trap beat with a synthesiser arrangement and digital drumming. The music is loud and emphatic, capturing the youthful and groovy vibe of the video. } ]
更新记录
- 【2024/07/30】 发布了200万和2000万字幕数据文件供下载。
- 【2024/06/10】 建立了GitHub页面。
许可证
视频样本来自公开可用的数据集。用户必须遵循相关许可证使用这些视频样本。我们提供了字幕文件。
引用
@misc{chi2024mmtrailmultimodaltrailervideo, title={MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions}, author={Xiaowei Chi and Yatian Wang and Aosong Cheng and Pengjun Fang and Zeyue Tian and Yingqing He and Zhaoyang Liu and Xingqun Qi and Jiahao Pan and Rongyu Zhang and Mengfei Li and Ruibin Yuan and Yanbing Jiang and Wei Xue and Wenhan Luo and Qifeng Chen and Shanghang Zhang and Qifeng Liu and Yike Guo}, year={2024}, eprint={2407.20962}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.20962}, }

- 1MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions香港科技大学 北京大学 · 2024年
Tropicos
Tropicos是一个全球植物名称数据库,包含超过130万种植物的名称、分类信息、分布数据、图像和参考文献。该数据库由密苏里植物园维护,旨在为植物学家、生态学家和相关领域的研究人员提供全面的植物信息。
www.tropicos.org 收录
全国 1∶200 000 数字地质图(公开版)空间数据库
As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.
DataCite Commons 收录
poi
本项目收集国内POI兴趣点,当前版本数据来自于openstreetmap。
github 收录
CHARLS
中国健康与养老追踪调查(CHARLS)数据集,旨在收集反映中国45岁及以上中老年人家庭和个人的高质量微观数据,用以分析人口老龄化问题,内容包括健康状况、经济状况、家庭结构和社会支持等。
charls.pku.edu.cn 收录
DALY
DALY数据集包含了全球疾病负担研究(Global Burden of Disease Study)中的伤残调整生命年(Disability-Adjusted Life Years, DALYs)数据。该数据集提供了不同国家和地区在不同年份的DALYs指标,用于衡量因疾病、伤害和早逝导致的健康损失。
ghdx.healthdata.org 收录