TrustMus benchmark: The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13644329
下载链接
链接失效反馈官方服务:
资源简介:
TrustMus is an initial, rigorously validated benchmark designed to assess the accuracy and reliability of large language models (LLMs) in the domain of musicology. This dataset includes a collection of 400 human-validated multiple-choice questions, categorized into four thematic areas: People (Ppl), Instruments and Technology (I&T), Genres, Forms, and Theory (Thr), and Culture and History (C&H).
The questions are derived from The Grove Dictionary Online using a semi-automated methodology. The process involves generating initial questions with a fine-tuned retrieval-augmented generation (RAG) model, filtering them through a series of automated checks, and finally validating them through expert human annotation. TrustMus is introduced in an initial paper, providing a critical resource for researchers and developers aiming to evaluate and improve LLM performance in this specialized field of musicology.
This benchmark is discussed in the paper :
BibTeX Citation:
@inproceedings{ramoneda2024trustmus, title={The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?}, author={Ramoneda, Pedro and Parada-Cabaleiro, Emilia and Weck, Benno and Serra, Xavier}, booktitle={Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)}, year={2024}, month={November}, address={San Francisco, USA}, organization={Co-located with ISMIR'2024}}
TrustMus 是首个经过严格验证的基准数据集,旨在评估大语言模型(LLM)在音乐学领域的准确性与可靠性。该数据集包含400道经人工验证的多项选择题,分为四大主题领域:人物(People,Ppl)、乐器与技术(Instruments and Technology,I&T)、体裁、形式与理论(Genres, Forms, and Theory,Thr)以及文化与历史(Culture and History,C&H)。
这些题目均源自《格罗夫音乐在线辞典》(The Grove Dictionary Online),采用半自动化方法生成。具体流程为:先通过微调后的检索增强生成(Retrieval-Augmented Generation,RAG)模型生成初始题目,再经过一系列自动化检查完成筛选,最终经由专家人工标注完成验证。TrustMus 在首篇相关研究论文中被正式提出,为旨在评估并提升大语言模型(LLM)在音乐学这一专业领域性能的研究者与开发者提供了关键研究资源。
该基准数据集的相关研究论文及BibTeX引用格式如下:
@inproceedings{ramoneda2024trustmus,
title={The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?},
author={Ramoneda, Pedro and Parada-Cabaleiro, Emilia and Weck, Benno and Serra, Xavier},
booktitle={Proceedings of the 3rd Workshop on Natural Language Processing for Music and Audio (NLP4MusA)},
year={2024},
month={November},
address={San Francisco, USA},
organization={Co-located with ISMIR'2024}}
创建时间:
2024-09-03



