five

symurbench_datasets

收藏
魔搭社区2025-09-16 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/ai-forever/symurbench_datasets
下载链接
链接失效反馈
官方服务:
资源简介:
# SyMuRBench Datasets and Precomputed Features This repository contains datasets and precomputed features for [SyMuRBench](https://github.com/Mintas/SyMuRBench), a benchmark for symbolic music understanding models. It includes metadata and MIDI files for multiple classification and retrieval tasks, along with pre-extracted **music21** and **jSymbolic** features. You can install and use the full pipeline via: 👉 [https://github.com/Mintas/SyMuRBench](https://github.com/Mintas/SyMuRBench) --- ## Overview SyMuRBench supports evaluation across diverse symbolic music tasks, including composer, genre, emotion, and instrument classification, as well as score-performance retrieval. This Hugging Face dataset provides: - Dataset metadata (CSV files) - MIDI files organized by task - Precomputed **music21** and **jSymbolic** features - Configuration-ready structure for immediate use in benchmarking --- ## Tasks Description | Task Name | Source Dataset | Task Type | # of Classes | # of Files | Default Metrics | |----------|----------------|-----------|--------------|------------|-----------------| | ComposerClassificationASAP | ASAP | Multiclass Classification | 7 | 197 | weighted f1 score, balanced accuracy | | GenreClassificationMMD | MetaMIDI | Multiclass Classification | 7 | 2,795 | weighted f1 score, balanced accuracy | | GenreClassificationWMTX | WikiMT-X | Multiclass Classification | 8 | 985 | weighted f1 score, balanced accuracy | | EmotionClassificationEMOPIA | Emopia | Multiclass Classification | 4 | 191 | weighted f1 score, balanced accuracy | | EmotionClassificationMIREX | MIREX | Multiclass Classification | 5 | 163 | weighted f1 score, balanced accuracy | | InstrumentDetectionMMD | MetaMIDI | Multilabel Classification | 128 | 4,675 | weighted f1 score | | ScorePerformanceRetrievalASAP | ASAP | Retrieval | - | 438 (219 pairs) | R@1, R@5, R@10, Median Rank | --- ## Precomputed Features Precomputed features are available in the `data/features/` folder: - `music21_full_dataset.parquet` - `jsymbolic_full_dataset.parquet` Each file contains a unified table with: - `midi_file`: Filename of the MIDI - `task`: Corresponding task name - `E_0` to `E_N`: Feature vector ### Example | midi_file | task | E_0 | E_1 | ... | E_672 | E_673 | |----------|------|-----|-----|-----|-------|-------| | Q1_0vLPYiPN7qY_1.mid | EmotionClassificationEMOPIA | 0.0 | 0.0 | ... | 0.0 | 0.0 | | Q1_4dXC1cC7crw_0.mid | EmotionClassificationEMOPIA | 0.0 | 0.0 | ... | 0.0 | 0.0 | ## File Structure The dataset is distributed as a ZIP archive: `data/datasets.zip` After extraction, the structure is: ``` datasets/ ├── composer_and_retrieval_datasets/ │ ├── metadata_composer_dataset.csv │ ├── metadata_retrieval_dataset.csv │ └── ... (MIDI files organized in subfolders) ├── genre_dataset/ │ ├── metadata_genre_dataset.csv │ └── midis/ ├── wikimtx_dataset/ │ ├── metadata_wikimtx_dataset.csv │ └── midis/ ├── emopia_dataset/ │ ├── metadata_emopia_dataset.csv │ └── midis/ ├── mirex_dataset/ │ ├── metadata_mirex_dataset.csv │ └── midis/ └── instrument_dataset/ ├── metadata_instrument_dataset.csv └── midis/ ``` * CSV files: Contain `filename` and `label` (or pair info for retrieval). * MIDI files: Used as input for feature extractors. --- ## How to Use You can download and extract everything using the built-in utility: ```python from symurbench.utils import load_datasets load_datasets(output_folder="./data", load_features=True) ``` This will: * Download datasets.zip and extract it * Optionally download precomputed features * Update config paths automatically --- ## License This dataset is released under the MIT License. --- ## Citation If you use SyMuRBench in your work, please cite: ```bibtex @inproceedings{symurbench2025, author = {Petr Strepetov and Dmitrii Kovalev}, title = {SyMuRBench: Benchmark for Symbolic Music Representations}, booktitle = {Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice (McGE '25)}, year = {2025}, pages = {9}, publisher = {ACM}, address = {Dublin, Ireland}, doi = {10.1145/3746278.3759392} } ```

# SyMuRBench 数据集与预计算特征 本仓库包含面向[SyMuRBench](https://github.com/Mintas/SyMuRBench)的数据集与预计算特征,该基准测试集用于符号音乐理解模型的性能评估。其涵盖多类分类与检索任务的元数据及MIDI文件,同时包含预先提取的**music21**与**jSymbolic**特征。 用户可通过以下链接安装并使用完整流程:👉 [https://github.com/Mintas/SyMuRBench](https://github.com/Mintas/SyMuRBench) --- ## 概览 SyMuRBench可支持多样化符号音乐任务的模型评估,涵盖作曲家分类、流派分类、情绪分类、乐器分类,以及乐谱-演奏检索任务。本Hugging Face数据集提供以下内容: - 数据集元数据(CSV文件) - 按任务组织的MIDI文件 - 预计算的**music21**与**jSymbolic**特征 - 可直接用于基准测试的预配置结构 --- ## 任务说明 | 任务名称 | 源数据集 | 任务类型 | 类别数 | 文件数 | 默认评估指标 | |----------|----------------|-----------|--------------|------------|-----------------| | ComposerClassificationASAP | ASAP | 多分类任务 | 7 | 197 | 加权F1分数、平衡准确率 | | GenreClassificationMMD | MetaMIDI | 多分类任务 | 7 | 2,795 | 加权F1分数、平衡准确率 | | GenreClassificationWMTX | WikiMT-X | 多分类任务 | 8 | 985 | 加权F1分数、平衡准确率 | | EmotionClassificationEMOPIA | Emopia | 多分类任务 | 4 | 191 | 加权F1分数、平衡准确率 | | EmotionClassificationMIREX | MIREX | 多分类任务 | 5 | 163 | 加权F1分数、平衡准确率 | | InstrumentDetectionMMD | MetaMIDI | 多标签分类任务 | 128 | 4,675 | 加权F1分数 | | ScorePerformanceRetrievalASAP | ASAP | 检索任务 | - | 438(219对) | R@1、R@5、R@10、中位排名 | --- ## 预计算特征 预计算特征存储于`data/features/`目录下: - `music21_full_dataset.parquet` - `jsymbolic_full_dataset.parquet` 每个文件均包含统一的数据表,字段包括: - `midi_file`:MIDI文件的文件名 - `task`:对应的任务名称 - `E_0` 至 `E_N`:特征向量 ### 示例 | midi_file | task | E_0 | E_1 | ... | E_672 | E_673 | |----------|------|-----|-----|-----|-------|-------| | Q1_0vLPYiPN7qY_1.mid | EmotionClassificationEMOPIA | 0.0 | 0.0 | ... | 0.0 | 0.0 | | Q1_4dXC1cC7crw_0.mid | EmotionClassificationEMOPIA | 0.0 | 0.0 | ... | 0.0 | 0.0 | ## 文件结构 本数据集以ZIP压缩包形式分发,路径为`data/datasets.zip`。解压后的目录结构如下: datasets/ ├── composer_and_retrieval_datasets/ │ ├── metadata_composer_dataset.csv │ ├── metadata_retrieval_dataset.csv │ └── ... (按子文件夹组织的MIDI文件) ├── genre_dataset/ │ ├── metadata_genre_dataset.csv │ └── midis/ ├── wikimtx_dataset/ │ ├── metadata_wikimtx_dataset.csv │ └── midis/ ├── emopia_dataset/ │ ├── metadata_emopia_dataset.csv │ └── midis/ ├── mirex_dataset/ │ ├── metadata_mirex_dataset.csv │ └── midis/ └── instrument_dataset/ ├── metadata_instrument_dataset.csv └── midis/ * CSV文件:包含`filename`与`label`字段(检索任务则包含配对信息)。 * MIDI文件:用作特征提取器的输入数据。 --- ## 使用方法 可通过内置工具下载并解压全部内容: python from symurbench.utils import load_datasets load_datasets(output_folder="./data", load_features=True) 该操作将完成以下内容: * 下载datasets.zip并完成解压 * 可选下载预计算特征 * 自动更新配置文件路径 --- ## 许可证 本数据集采用MIT许可证发布。 --- ## 引用方式 若您在研究中使用SyMuRBench,请引用以下文献: bibtex @inproceedings{symurbench2025, author = {Petr Strepetov and Dmitrii Kovalev}, title = {SyMuRBench: Benchmark for Symbolic Music Representations}, booktitle = {Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice (McGE '25)}, year = {2025}, pages = {9}, publisher = {ACM}, address = {Dublin, Ireland}, doi = {10.1145/3746278.3759392} }
提供机构:
maas
创建时间:
2025-08-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作