TrustMus benchmark: The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13644329

下载链接

链接失效反馈

官方服务：

资源简介：

TrustMus is an initial, rigorously validated benchmark designed to assess the accuracy and reliability of large language models (LLMs) in the domain of musicology. This dataset includes a collection of 400 human-validated multiple-choice questions, categorized into four thematic areas: People (Ppl), Instruments and Technology (I&T), Genres, Forms, and Theory (Thr), and Culture and History (C&H). The questions are derived from The Grove Dictionary Online using a semi-automated methodology. The process involves generating initial questions with a fine-tuned retrieval-augmented generation (RAG) model, filtering them through a series of automated checks, and finally validating them through expert human annotation. TrustMus is introduced in an initial paper, providing a critical resource for researchers and developers aiming to evaluate and improve LLM performance in this specialized field of musicology. This benchmark is discussed in the paper : BibTeX Citation: @inproceedings{ramoneda2024trustmus, title={The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?}, author={Ramoneda, Pedro and Parada-Cabaleiro, Emilia and Weck, Benno and Serra, Xavier}, booktitle={Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)}, year={2024}, month={November}, address={San Francisco, USA}, organization={Co-located with ISMIR'2024}}

TrustMus 是首个经过严格验证的基准数据集，旨在评估大语言模型（LLM）在音乐学领域的准确性与可靠性。该数据集包含400道经人工验证的多项选择题，分为四大主题领域：人物（People，Ppl）、乐器与技术（Instruments and Technology，I&T）、体裁、形式与理论（Genres, Forms, and Theory，Thr）以及文化与历史（Culture and History，C&H）。这些题目均源自《格罗夫音乐在线辞典》（The Grove Dictionary Online），采用半自动化方法生成。具体流程为：先通过微调后的检索增强生成（Retrieval-Augmented Generation，RAG）模型生成初始题目，再经过一系列自动化检查完成筛选，最终经由专家人工标注完成验证。TrustMus 在首篇相关研究论文中被正式提出，为旨在评估并提升大语言模型（LLM）在音乐学这一专业领域性能的研究者与开发者提供了关键研究资源。该基准数据集的相关研究论文及BibTeX引用格式如下： @inproceedings{ramoneda2024trustmus, title={The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?}, author={Ramoneda, Pedro and Parada-Cabaleiro, Emilia and Weck, Benno and Serra, Xavier}, booktitle={Proceedings of the 3rd Workshop on Natural Language Processing for Music and Audio (NLP4MusA)}, year={2024}, month={November}, address={San Francisco, USA}, organization={Co-located with ISMIR'2024}}

创建时间：

2024-09-03