MuChoMusic dataset
收藏DataCite Commons2025-10-13 更新2026-04-25 收录
下载链接:
https://dataverse.csuc.cat/citation?persistentId=doi:10.34810/data2642
下载链接
链接失效反馈官方服务:
资源简介:
<h2>MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models</h2>
<p>MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration.</p>
<h3>Note on Audio Files</h3>
<p>This dataset comes without audio files. The audio files can be downloaded from two datasets: <a href="https://doi.org/10.5281/zenodo.10072001" target="_new" rel="noreferrer">SongDescriberDataset (SDD)</a> and <a href="https://www.kaggle.com/datasets/googleai/musiccaps" target="_new" rel="noreferrer">MusicCaps</a>. Please see the <a href="https://github.com/mulab-mir/muchomusic" target="_new" rel="noreferrer">code repository</a> for more information on how to download the audio.</p>
<h3>Citation</h3>
<p>If you use this dataset, please cite our <a href="https://arxiv.org/abs/2408.01337" target="_blank" rel="noopener">paper</a>:</p>
<pre><code>@inproceedings{weck2024muchomusic,
title={MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models},
author={Weck, Benno and Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Bogdanov, Dmitry},
booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)},
year={2024}
}</code></pre>
Weck B, Manco I, Benetos E, Quinton E, Fazekas G, Bogdanov D. MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models. In: Kaneshiro B, Mysore G, Nieto O, Donahue C, Huang CZA, Lee JH, McFee B, McCallum M, editors. Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.
提供机构:
CORA.Repositori de Dades de Recerca
创建时间:
2025-10-07



