five

m-a-p/Music-Instruct

收藏
Hugging Face2023-10-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/m-a-p/Music-Instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 --- # Music Instruct (MI) Dataset This is the dataset used to train and evaluate the MusiLingo model. This dataset contains Q&A pairs related to individual musical compositions, specifically tailored for open-ended music queries. It originates from the music-caption pairs in the MusicCaps dataset. The MI dataset was created through prompt engineering and applying few-shot learning techniques to GPT-4. More details on dataset generation can be found in our paper *[MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response ](https://arxiv.org/abs/2309.08730)*. The resulting MI dataset consists of two versions: v1 (short questions), with 27,540 Q&A pairs seeking comprehensive details about musical snippets including but not limited to emotion, instrument, vocal track, tempo, and genre etc., often yielding concise one or two-sentence responses. In contrast, v2 comprises 32,953 Q&A pairs featuring more general questions about the musical pieces (long questions), resulting in typically more extensive responses that serve as paraphrased renditions of the original caption. ## Evaluation and dataset SPlittion You can use all (or the long/short partition of) the Q\&A pairs of which audio is in the training split of AudioSet as MI training set and use the short QA and long QA with audio in evaluation split of AudioSet as two testingsets separately. ``` # training set ds_mixed_train = MIDataset(processor, '/content/drive/MyDrive/music_data', split='train', question_type='all') ds_long_train = MIDataset(processor, '/content/drive/MyDrive/music_data', split='train', question_type='long') ds_short_train = MIDataset(processor, '/content/drive/MyDrive/music_data', split='train', question_type='short') # testing set for short QA ds_short = MIDataset(processor, '/content/drive/MyDrive/music_data', split='test', question_type='short') # testing set for long QA ds_long = MIDataset(processor, '/content/drive/MyDrive/music_data', split='test', question_type='long') ``` And the evaluation includes BLEU, METEOR, ROUGE, and Bert-Score. ## Citation ``` @article{deng2023musilingo, title={MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response}, author={Deng, Zihao and Ma, Yinghao and Liu, Yudong and Guo, Rongchen and Zhang, Ge and Chen, Wenhu and Huang, Wenhao and Benetos, Emmanouil}, journal={arXiv preprint arXiv:2309.08730}, year={2023} } ```
提供机构:
m-a-p
原始信息汇总

Music Instruct (MI) 数据集

概述

Music Instruct (MI) 数据集用于训练和评估 MusiLingo 模型。该数据集包含与音乐作品相关的问答对,特别针对开放式音乐查询。数据集源自 MusicCaps 数据集中的音乐-标题对。MI 数据集通过提示工程和应用少量学习技术从 GPT-4 生成。

数据集版本

  • v1 (短问题):包含 27,540 个问答对,寻求关于音乐片段的详细信息,如情感、乐器、声轨、节奏和流派等,通常产生简短的一到两句话回答。
  • v2 (长问题):包含 32,953 个问答对,涉及更一般的音乐作品问题,通常产生更广泛的回答,作为原始标题的改写版本。

评估和数据集划分

数据集可以分为训练集和测试集:

  • 训练集:使用 AudioSet 训练分割中的音频的所有(或长/短部分)问答对作为 MI 训练集。
  • 测试集:使用 AudioSet 评估分割中的音频的短问答和长问答分别作为两个测试集。

训练集示例

python ds_mixed_train = MIDataset(processor, /content/drive/MyDrive/music_data, split=train, question_type=all) ds_long_train = MIDataset(processor, /content/drive/MyDrive/music_data, split=train, question_type=long) ds_short_train = MIDataset(processor, /content/drive/MyDrive/music_data, split=train, question_type=short)

测试集示例

python ds_short = MIDataset(processor, /content/drive/MyDrive/music_data, split=test, question_type=short) ds_long = MIDataset(processor, /content/drive/MyDrive/music_data, split=test, question_type=long)

评估指标

评估包括 BLEU、METEOR、ROUGE 和 Bert-Score。

引用

plaintext @article{deng2023musilingo, title={MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response}, author={Deng, Zihao and Ma, Yinghao and Liu, Yudong and Guo, Rongchen and Zhang, Ge and Chen, Wenhu and Huang, Wenhao and Benetos, Emmanouil}, journal={arXiv preprint arXiv:2309.08730}, year={2023} }

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作