HeartMuLa-Benchmark
收藏魔搭社区2026-05-11 更新2026-05-10 收录
下载链接:
https://modelscope.cn/datasets/HeartMuLa/HeartMuLa-Benchmark
下载链接
链接失效反馈官方服务:
资源简介:
# HeartMuLa-Benchmark
## Overview
This benchmark is designed to evaluate music generation models, specifically for use in our **HeartMuLa** project.
It provides a standardized dataset and evaluation metrics to quantify model performance on multi-language, multi-label music generation tasks.
## Dataset
The dataset contains samples in five languages:
- Chinese
- English
- Japanese
- Korean
- Spanish
Each language folder contains multiple subfolders, each including:
- `tags.txt`: AI-generated music tags, which can be used as conditional input for music generation models.
- `lyrics.txt`: AI-generated lyrics, also usable as model input.
These files serve as standardized conditions for model evaluation.
## Evaluation
### Lyrics Evaluation
1. Use **HeartTranscriptor** (Link: https://modelscope.cn/models/HeartMuLa/HeartTranscriptor-oss) to transcribe the model-generated music into lyrics.
2. Compute **Word Error Rate (WER)** or **Phoneme Error Rate (PER)** to measure alignment between the generated music and target lyrics.
### Tag Evaluation
1. Use Tencent’s **MuQ-MuLan** to extract embeddings for both the generated music and the reference tags.
2. Compute **Cosine Similarity** to measure semantic alignment between the generated music and target tags.
# HeartMuLa基准测试集
## 概述
本基准测试集专为音乐生成模型评估打造,核心服务于我们的**HeartMuLa**项目。其提供标准化数据集与评估指标,可量化模型在多语言、多标签音乐生成任务中的表现。
## 数据集
该数据集涵盖5种语言的样本:
- 中文
- 英语
- 日语
- 韩语
- 西班牙语
每个语言文件夹下包含若干子文件夹,每个子文件夹均包含:
- `tags.txt`:AI生成的音乐标签,可作为音乐生成模型的条件输入。
- `lyrics.txt`:AI生成的歌词,同样可作为模型输入。
上述文件均作为模型评估的标准化条件。
## 评估
### 歌词评估
1. 使用**HeartTranscriptor**(链接:https://modelscope.cn/models/HeartMuLa/HeartTranscriptor-oss)将模型生成的音乐转录为歌词。
2. 计算**词错误率(Word Error Rate, WER)**或**音素错误率(Phoneme Error Rate, PER)**,以衡量生成音乐与目标歌词的对齐程度。
### 标签评估
1. 使用腾讯的**MuQ-MuLan**提取生成音乐与参考标签的嵌入向量。
2. 计算**余弦相似度(Cosine Similarity)**,以衡量生成音乐与目标标签的语义对齐程度。
提供机构:
maas
创建时间:
2026-01-20



