MUSICAVQA
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/MUSICAVQA
下载链接
链接失效反馈官方服务:
资源简介:
为了探索对视听方式的场景理解和时空推理,我们构建了一个大型视听数据集MUSIC-AVQA,该数据集专注于问答任务。如上所述,高质量的数据集对于AVQA研究具有相当大的价值。
为什么是音乐表演?考虑到音乐表演是一个典型的多模态场景,由丰富的视听组件及其相互作用组成,因此适合用于探索有效的视听场景理解和推理。
基本信息
我们选择从YouTube手动收集大量的音乐表演视频。具体选择了吉他、大提琴、木琴等22种乐器,并据此设计了9种视听题型,涵盖了视听、视听三种不同的场景。注释是由我们的GSAI标记系统使用小说收集的。
特征
3典型的多式联运场景
22种仪器
4类: 弦、风、打击乐器和键盘。
9,290视频超过150小时
7,423真实视频
1,867合成视频
9个视听题型
45,867问答对
多样性、复杂性和动态性
个人数据/人类受试者
音乐视频-AVQA在YouTube上公开,并通过众包进行注释。我们已经解释了如何将数据用于众筹人员。我们的数据集不包含个人身份信息或令人反感的内容。
To explore audiovisual scene understanding and spatio-temporal reasoning, we constructed a large-scale audiovisual dataset MUSIC-AVQA, which focuses on the question answering (QA) task. As previously stated, high-quality datasets hold significant value for AVQA research.
Why music performances? Music performance is a typical multimodal scenario comprising abundant audiovisual components and their interactions, making it suitable for investigating effective audiovisual scene understanding and reasoning.
Basic Information: We manually collected a large number of music performance videos from YouTube. We selected 22 types of instruments including guitar, cello, xylophone, etc., and designed 9 audiovisual question types based on these instruments, covering three distinct audiovisual scenarios. Annotations were generated using our GSAI annotation system.
Key Features:
- 3 typical multimodal scenarios
- 22 types of instruments
- 4 categories: string instruments, wind instruments, percussion instruments, and keyboard instruments
- 9,290 videos totaling over 150 hours in duration
- 7,423 real-world videos
- 1,867 synthetic videos
- 9 audiovisual question types
- 45,867 question-answer pairs
- Exhibiting diversity, complexity, and dynamics
Personal data/human subjects: The Music Video-AVQA dataset is publicly available on YouTube, with annotations completed via crowdsourcing. We have clarified the data usage guidelines for crowdsourcing annotators. Our dataset does not contain any personally identifiable information or offensive content.
提供机构:
OpenDataLab
创建时间:
2023-02-13
搜集汇总
数据集介绍

背景与挑战
背景概述
MUSIC-AVQA是一个专注于音乐表演场景的大型视听问答数据集,旨在探索多模态场景理解和时空推理。该数据集包含9,290个视频(超过150小时)和45,867个问答对,覆盖22种乐器,并设计了9种视听题型,具有多样性和复杂性。数据来源于YouTube,由研究机构通过众包注释构建,适用于视听问答任务的研究。
以上内容由遇见数据集搜集并总结生成



