MusicCaps

Name: MusicCaps
Creator: maas
Published: 2025-12-05 12:14:07
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/google/MusicCaps

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for MusicCaps ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://kaggle.com/datasets/googleai/musiccaps - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The MusicCaps dataset contains **5,521 music examples, each of which is labeled with an English *aspect list* and a *free text caption* written by musicians**. An aspect list is for example *"pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead"*, while the caption consists of multiple sentences about the music, e.g., *"A low sounding male voice is rapping over a fast paced drums playing a reggaeton beat along with a bass. Something like a guitar is playing the melody along. This recording is of poor audio-quality. In the background a laughter can be noticed. This song may be playing in a bar."* The text is solely focused on describing *how* the music sounds, not the metadata like the artist name. The labeled examples are 10s music clips from the [**AudioSet**](https://research.google.com/audioset/) dataset (2,858 from the eval and 2,663 from the train split). Please cite the corresponding paper, when using this dataset: http://arxiv.org/abs/2301.11325 (DOI: `10.48550/arXiv.2301.11325`) ### Dataset Usage The published dataset takes the form of a `.csv` file that contains the ID of YouTube videos and their start/end stamps. In order to use this dataset, one must download the corresponding YouTube videos and chunk them according to the start/end times. The following repository has an example script and notebook to load the clips. The notebook also includes a Gradio demo that helps explore some samples: https://github.com/nateraw/download-musiccaps-dataset ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields #### ytid YT ID pointing to the YouTube video in which the labeled music segment appears. You can listen to the segment by opening https://youtu.be/watch?v={ytid}&start={start_s} #### start_s Position in the YouTube video at which the music starts. #### end_s Position in the YouTube video at which the music end. All clips are 10s long. #### audioset_positive_labels Labels for this segment from the AudioSet (https://research.google.com/audioset/) dataset. #### aspect_list A list of aspects describing the music. #### caption A multi-sentence free text caption describing the music. #### author_id An integer for grouping samples by who wrote them. #### is_balanced_subset If this value is true, the row is a part of the 1k subset which is genre-balanced. #### is_audioset_eval If this value is true, the clip is from the AudioSet eval split. Otherwise it is from the AudioSet train split. ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators This dataset was shared by [@googleai](https://ai.google/research/) ### Licensing Information The license for this dataset is cc-by-sa-4.0 ### Citation Information ```bibtex [More Information Needed] ``` ### Contributions [More Information Needed]

# MusicCaps 数据集卡片 ## 目录 - [目录](#table-of-contents) - [数据集概述](#dataset-description) - [数据集概要](#dataset-summary) - [支持的任务与基准测试榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据拆分](#data-splits) - [数据集构建](#dataset-creation) - [构建依据](#curation-rationale) - [源数据](#source-data) - [标注](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献](#contributions) ## 数据集概述 - **主页**：https://kaggle.com/datasets/googleai/musiccaps - **代码仓库**： - **相关论文**： - **基准测试榜**： - **联系人**： ### 数据集概要 MusicCaps 数据集包含**5521个音乐示例**，每个示例均标注有英文**属性列表（aspect list）**与由音乐家撰写的**自由文本说明（free text caption）**。属性列表示例如下：*"pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead"*，而说明则由多条描述音乐的语句组成，例如： *"A low sounding male voice is rapping over a fast paced drums playing a reggaeton beat along with a bass. Something like a guitar is playing the melody along. This recording is of poor audio-quality. In the background a laughter can be noticed. This song may be playing in a bar."* 文本仅用于描述音乐的听感，而非艺术家姓名等元数据。所有标注示例均为来自**音频集（AudioSet）**数据集的10秒音乐片段，其中2858个来自评估拆分，2663个来自训练拆分。使用该数据集时请引用对应论文：http://arxiv.org/abs/2301.11325（DOI：`10.48550/arXiv.2301.11325`） ### 数据集使用方式发布的数据集为`.csv`格式文件，包含YouTube视频ID及其起始/结束时间戳。使用该数据集需下载对应YouTube视频，并按照起止时间裁剪片段。以下仓库提供了加载片段的示例脚本与笔记本，该笔记本还包含用于探索样本的Gradio演示：https://github.com/nateraw/download-musiccaps-dataset ### 支持的任务与基准测试榜 [需补充更多信息] ### 语言 [需补充更多信息] ## 数据集结构 ### 数据实例 [需补充更多信息] ### 数据字段 #### ytid 指向包含标注音乐片段的YouTube视频的YT ID。可通过访问`https://youtu.be/watch?v={ytid}&start={start_s}`收听该片段。 #### start_s YouTube视频中音乐开始的时间点。 #### end_s YouTube视频中音乐结束的时间点。所有片段时长均为10秒。 #### audioset_positive_labels 来自音频集（AudioSet）数据集的该片段的标注标签。 #### aspect_list 描述音乐的属性列表。 #### caption 描述音乐的多句自由文本说明。 #### author_id 用于按标注撰写者分组的整数标识。 #### is_balanced_subset 若该值为`true`，则此行属于经过流派平衡的1k子集。 #### is_audioset_eval 若该值为`true`，则该片段来自音频集（AudioSet）评估拆分，否则来自音频集（AudioSet）训练拆分。 ### 数据拆分 [需补充更多信息] ## 数据集构建 ### 构建依据 [需补充更多信息] ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生产者是谁？ [需补充更多信息] ### 标注 #### 标注流程 [需补充更多信息] #### 标注者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集维护者该数据集由[@googleai](https://ai.google/research/)共享。 ### 许可信息该数据集的许可协议为cc-by-sa-4.0。 ### 引用信息 bibtex [需补充更多信息] ### 贡献 [需补充更多信息]

提供机构：

maas

创建时间：

2025-04-21

搜集汇总

数据集介绍