MusicTheoryBench
收藏魔搭社区2025-11-12 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/m-a-p/MusicTheoryBench
下载链接
链接失效反馈官方服务:
资源简介:
[**🌐 DemoPage**](https://ezmonyi.github.io/ChatMusician/) | [**🤗 Dataset**](https://huggingface.co/datasets/m-a-p/MusicPile) | [**🤗 Benchmark**](https://huggingface.co/datasets/m-a-p/MusicTheoryBench) | [**📖 arXiv**](http://arxiv.org/abs/2402.16153) | [💻 **Code**](https://github.com/hf-lin/ChatMusician) | [**🤖 Model**](https://huggingface.co/m-a-p/ChatMusician)
# Dataset Card for MusicTheoryBench
MusicTheoryBench is a benchmark designed to **assess the advanced music understanding capabilities** of current LLMs.
You can easily load it:
```
from datasets import load_dataset
dataset = load_dataset("m-a-p/MusicTheoryBench")
```
The evaluation code will be available in the coming weeks.
## Dataset Structure
MusicTheoryBench consists of 372 questions, formatted as multiple-choice questions, each with 4 options, among which only one is correct. There are 269 questions on music knowledge and 98 questions on music reasoning, along with 5 questions held out for enabling few-shot evaluation.
## Dataset Details
Despite the significant advancements in music information retrieval,the definition of advanced music understanding capabilities remains unclear in current research.
To measure the advanced understanding abilities of existing LLMs in music, [MAP](https://m-a-p.ai/) first define two critical elements of music understanding: **music knowledge** and **music reasoning**. Definition of music knowledge and reasoning is discussed in [ChatMusician paper](http://arxiv.org/abs/2402.16153).
### music knowledge subset
In the music knowledge subset, the questions span Eastern and Western musical aspects.
It includes 30 topics such as notes, rhythm, beats, chords, counterpoint, orchestration and instrumentation, music-related culture, history, etc.
Each major area undergoes targeted examination under the guidance of experts and is divided into various subcategories.
For example, in the triads section, the test set specifically examines the definition, types, and related technical details of triads.
This test also features different levels of difficulty, corresponding to the high school and college levels of music major students.
### music reasoning subset
Most of the questions in the reasoning subset require both music knowledge and reasoning capabilities. Correctly answering these questions requires detailed analysis of the given information and multi-step logical reasoning, calculating chords, melodies, scales, rhythms, etc.
## Curation Process
To ensure consistency with human testing standards, MusicTheoryBenchmark is crafted by a professional college music teacher according to college-level textbooks and exam papers. The content underwent multiple rounds of discussions and reviews by a team of musicians. The team carefully selected questions and manually compiled them into JSON and ABC notation. The questions are then labeled into music knowledge and music reasoning subsets. Since the teacher is from China, half of the questions are delivered in Chinese, and later translated into English with GPT-4 Azure API and proofread by the team.
### Languages
MusicTheoryBench primarily contains English.
## Limitations
- The MusicThoeryBench results reported in [ChatMusician paper](http://arxiv.org/abs/2402.16153) are obtained with perplexity mode. Direct generation may result in a worse performance. See [Opencompass documentaion](https://opencompass.readthedocs.io/en/latest/get_started/faq.html#what-are-the-differences-and-connections-between-ppl-and-gen) for more details.
## Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{yuan2024chatmusician,
title={ChatMusician: Understanding and Generating Music Intrinsically with LLM},
author={Ruibin Yuan and Hanfeng Lin and Yi Wang and Zeyue Tian and Shangda Wu and Tianhao Shen and Ge Zhang and Yuhang Wu and Cong Liu and Ziya Zhou and Ziyang Ma and Liumeng Xue and Ziyu Wang and Qin Liu and Tianyu Zheng and Yizhi Li and Yinghao Ma and Yiming Liang and Xiaowei Chi and Ruibo Liu and Zili Wang and Pengfei Li and Jingcheng Wu and Chenghua Lin and Qifeng Liu and Tao Jiang and Wenhao Huang and Wenhu Chen and Emmanouil Benetos and Jie Fu and Gus Xia and Roger Dannenberg and Wei Xue and Shiyin Kang and Yike Guo},
year={2024},
eprint={2402.16153},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
```
## Dataset Card Contact
Authors of ChatMusician.
[**🌐 演示页面**](https://ezmonyi.github.io/ChatMusician/) | [**🤗 数据集**](https://huggingface.co/datasets/m-a-p/MusicPile) | [**🤗 基准测试集**](https://huggingface.co/datasets/m-a-p/MusicTheoryBench) | [**📖 arXiv论文**](http://arxiv.org/abs/2402.16153) | [💻 **代码**](https://github.com/hf-lin/ChatMusician) | [**🤖 模型**](https://huggingface.co/m-a-p/ChatMusician)
# MusicTheoryBench 数据集卡片
MusicTheoryBench是一款专为评估当前大语言模型(Large Language Model, LLM)高级音乐理解能力而设计的基准测试集。
您可通过如下方式轻松加载该数据集:
from datasets import load_dataset
dataset = load_dataset("m-a-p/MusicTheoryBench")
相关评估代码将在未来几周内公开。
## 数据集结构
MusicTheoryBench共包含372道多项选择题,每道题设置4个选项,其中仅1个为正确答案。其中音乐知识类题目269道,音乐推理类题目98道,另有5道预留题目用于少样本(Few-shot)评估。
## 数据集详情
尽管音乐信息检索领域已取得显著进展,但当前研究中对「高级音乐理解能力」的定义仍不明确。为评估现有大语言模型的音乐高级理解能力,[MAP](https://m-a-p.ai/) 首次明确了音乐理解的两大核心要素:**音乐知识(music knowledge)**与**音乐推理(music reasoning)**。关于音乐知识与推理的定义,请参阅[ChatMusician论文](http://arxiv.org/abs/2402.16153)。
### 音乐知识子集
该子集下的题目覆盖东西方音乐领域,涵盖音符、节奏、节拍、和弦、对位法、配器法、音乐相关文化与历史等30余个主题。每个核心主题均在专家指导下进行针对性命题,并细分为多个子类别。例如,在三和弦板块中,测试集将针对三和弦的定义、类型及相关技术细节进行考察。该测试还设置了不同难度层级,对应音乐专业高中与大学阶段的学习要求。
### 音乐推理子集
该子集下的多数题目需同时运用音乐知识与推理能力方可解答。要正确作答,需对给定信息进行细致分析,并通过多步逻辑推理完成和弦、旋律、音阶、节奏等内容的计算推导。
## 构建流程
为确保与人类测试标准保持一致,MusicTheoryBench由一名专业高校音乐教师依据大学层级的音乐教材与试卷命题。该团队经过多轮讨论与评审,精心筛选题目并手动整理为JSON格式与ABC记谱法(ABC notation),随后将题目标注为音乐知识子集与音乐推理子集。由于该教师来自中国,半数题目最初以中文撰写,后续通过GPT-4 Azure API翻译为英文,并由团队进行校对。
### 语言版本
MusicTheoryBench主体内容为英文。
## 局限性
- [ChatMusician论文](http://arxiv.org/abs/2402.16153)中报告的MusicTheoryBench测试结果均基于困惑度模式(perplexity mode)获得。直接生成式作答可能会导致表现下降。更多细节可参阅[OpenCompass文档](https://opencompass.readthedocs.io/en/latest/get_started/faq.html#what-are-the-differences-and-connections-between-ppl-and-gen)。
## 引用
若您认为本工作对您有所帮助,请引用我们的论文。
@misc{yuan2024chatmusician,
title={ChatMusician: Understanding and Generating Music Intrinsically with LLM},
author={Ruibin Yuan and Hanfeng Lin and Yi Wang and Zeyue Tian and Shangda Wu and Tianhao Shen and Ge Zhang and Yuhang Wu and Cong Liu and Ziya Zhou and Ziyang Ma and Liumeng Xue and Ziyu Wang and Qin Liu and Tianyu Zheng and Yizhi Li and Yinghao Ma and Yiming Liang and Xiaowei Chi and Ruibo Liu and Zili Wang and Pengfei Li and Jingcheng Wu and Chenghua Lin and Qifeng Liu and Tao Jiang and Wenhao Huang and Wenhu Chen and Emmanouil Benetos and Jie Fu and Gus Xia and Roger Dannenberg and Wei Xue and Shiyin Kang and Yike Guo},
year={2024},
eprint={2402.16153},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
## 数据集卡片联系人
ChatMusician项目作者。
提供机构:
maas
创建时间:
2024-04-14



