MusicCaps
收藏魔搭社区2025-12-05 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/google/MusicCaps
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for MusicCaps
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://kaggle.com/datasets/googleai/musiccaps
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
The MusicCaps dataset contains **5,521 music examples, each of which is labeled with an English *aspect list* and a *free text caption* written by musicians**. An aspect list is for example *"pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead"*, while the caption consists of multiple sentences about the music, e.g.,
*"A low sounding male voice is rapping over a fast paced drums playing a reggaeton beat along with a bass. Something like a guitar is playing the melody along. This recording is of poor audio-quality. In the background a laughter can be noticed. This song may be playing in a bar."*
The text is solely focused on describing *how* the music sounds, not the metadata like the artist name.
The labeled examples are 10s music clips from the [**AudioSet**](https://research.google.com/audioset/) dataset (2,858 from the eval and 2,663 from the train split).
Please cite the corresponding paper, when using this dataset: http://arxiv.org/abs/2301.11325 (DOI: `10.48550/arXiv.2301.11325`)
### Dataset Usage
The published dataset takes the form of a `.csv` file that contains the ID of YouTube videos and their start/end stamps. In order to use this dataset, one must download the corresponding YouTube videos and chunk them according to the start/end times.
The following repository has an example script and notebook to load the clips. The notebook also includes a Gradio demo that helps explore some samples: https://github.com/nateraw/download-musiccaps-dataset
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
#### ytid
YT ID pointing to the YouTube video in which the labeled music segment appears. You can listen to the segment by opening https://youtu.be/watch?v={ytid}&start={start_s}
#### start_s
Position in the YouTube video at which the music starts.
#### end_s
Position in the YouTube video at which the music end. All clips are 10s long.
#### audioset_positive_labels
Labels for this segment from the AudioSet (https://research.google.com/audioset/) dataset.
#### aspect_list
A list of aspects describing the music.
#### caption
A multi-sentence free text caption describing the music.
#### author_id
An integer for grouping samples by who wrote them.
#### is_balanced_subset
If this value is true, the row is a part of the 1k subset which is genre-balanced.
#### is_audioset_eval
If this value is true, the clip is from the AudioSet eval split. Otherwise it is from the AudioSet train split.
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
This dataset was shared by [@googleai](https://ai.google/research/)
### Licensing Information
The license for this dataset is cc-by-sa-4.0
### Citation Information
```bibtex
[More Information Needed]
```
### Contributions
[More Information Needed]
# MusicCaps 数据集卡片
## 目录
- [目录](#table-of-contents)
- [数据集概述](#dataset-description)
- [数据集概要](#dataset-summary)
- [支持的任务与基准测试榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据拆分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建依据](#curation-rationale)
- [源数据](#source-data)
- [标注](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献](#contributions)
## 数据集概述
- **主页**:https://kaggle.com/datasets/googleai/musiccaps
- **代码仓库**:
- **相关论文**:
- **基准测试榜**:
- **联系人**:
### 数据集概要
MusicCaps 数据集包含**5521个音乐示例**,每个示例均标注有英文**属性列表(aspect list)**与由音乐家撰写的**自由文本说明(free text caption)**。属性列表示例如下:*"pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead"*,而说明则由多条描述音乐的语句组成,例如:
*"A low sounding male voice is rapping over a fast paced drums playing a reggaeton beat along with a bass. Something like a guitar is playing the melody along. This recording is of poor audio-quality. In the background a laughter can be noticed. This song may be playing in a bar."*
文本仅用于描述音乐的听感,而非艺术家姓名等元数据。
所有标注示例均为来自**音频集(AudioSet)**数据集的10秒音乐片段,其中2858个来自评估拆分,2663个来自训练拆分。
使用该数据集时请引用对应论文:http://arxiv.org/abs/2301.11325(DOI:`10.48550/arXiv.2301.11325`)
### 数据集使用方式
发布的数据集为`.csv`格式文件,包含YouTube视频ID及其起始/结束时间戳。使用该数据集需下载对应YouTube视频,并按照起止时间裁剪片段。
以下仓库提供了加载片段的示例脚本与笔记本,该笔记本还包含用于探索样本的Gradio演示:https://github.com/nateraw/download-musiccaps-dataset
### 支持的任务与基准测试榜
[需补充更多信息]
### 语言
[需补充更多信息]
## 数据集结构
### 数据实例
[需补充更多信息]
### 数据字段
#### ytid
指向包含标注音乐片段的YouTube视频的YT ID。可通过访问`https://youtu.be/watch?v={ytid}&start={start_s}`收听该片段。
#### start_s
YouTube视频中音乐开始的时间点。
#### end_s
YouTube视频中音乐结束的时间点。所有片段时长均为10秒。
#### audioset_positive_labels
来自音频集(AudioSet)数据集的该片段的标注标签。
#### aspect_list
描述音乐的属性列表。
#### caption
描述音乐的多句自由文本说明。
#### author_id
用于按标注撰写者分组的整数标识。
#### is_balanced_subset
若该值为`true`,则此行属于经过流派平衡的1k子集。
#### is_audioset_eval
若该值为`true`,则该片段来自音频集(AudioSet)评估拆分,否则来自音频集(AudioSet)训练拆分。
### 数据拆分
[需补充更多信息]
## 数据集构建
### 构建依据
[需补充更多信息]
### 源数据
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言生产者是谁?
[需补充更多信息]
### 标注
#### 标注流程
[需补充更多信息]
#### 标注者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据集使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
该数据集由[@googleai](https://ai.google/research/)共享。
### 许可信息
该数据集的许可协议为cc-by-sa-4.0。
### 引用信息
bibtex
[需补充更多信息]
### 贡献
[需补充更多信息]
提供机构:
maas
创建时间:
2025-04-21
搜集汇总
数据集介绍

背景与挑战
背景概述
MusicCaps数据集包含5,521个音乐示例,每个示例带有详细的英文描述和方面列表,专注于音乐的声音特征。数据来源于AudioSet数据集,以.csv文件形式提供YouTube视频ID及时间戳信息。
以上内容由遇见数据集搜集并总结生成



