five

MusicPile-sft

收藏
魔搭社区2025-11-12 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/m-a-p/MusicPile-sft
下载链接
链接失效反馈
官方服务:
资源简介:
[**🌐 DemoPage**](https://ezmonyi.github.io/ChatMusician/) | [**🤗 Pretrain Dataset**](https://huggingface.co/datasets/m-a-p/MusicPile) | [**🤗 Benchmark**](https://huggingface.co/datasets/m-a-p/MusicTheoryBench) | [**📖 arXiv**](http://arxiv.org/abs/2402.16153) | [💻 **Code**](https://github.com/hf-lin/ChatMusician) | [**🤖 Chat Model**](https://huggingface.co/m-a-p/ChatMusician) | [**🤖 Base Model**](https://huggingface.co/m-a-p/ChatMusician-Base) # Dataset Card for MusicPile-sft *MusicPile-sft* is a subset of [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile). It contains **1.14M** samples with a ratio of music verbal to music score(abc notation) of 2:1. Here is the overview: | Datasets | Sourced from | # Samples | Category | Format | | --- | --- | --- | --- | --- | | [IrishMAN](https://huggingface.co/datasets/sander-wood/irishman) | public dataset + Human-written Instructions | 340K | music score | chat | | [KernScores](http://kern.ccarh.org) | public dataset + Human-written Instructions | 10K | music score | chat | | [JSB Chorales](https://github.com/sander-wood/deepchoir) | public dataset + Human-written Instructions | 33.5k | music score | chat | | music knowledge** | Generated with GPT-4 | 255K | music verbal | chat | | music summary** | Generated with GPT-4 | 500K | music verbal | chat | Note: The data of JSB Chorales is repeated 100 times.(Because there is so little data on compositions in the Bach style.) You can easily load it: ```python from datasets import load_dataset ds = load_dataset("m-a-p/MusicPile-sft") ``` ## Languages *MusicPile-sft* primarily contains English. ## Dataset Structure *MusicPile-sft* has 5 fields `id`,`src`, `input`, `instruction` and `output`. ## Citation If you find our work helpful, feel free to give us a cite. ``` @misc{yuan2024chatmusician, title={ChatMusician: Understanding and Generating Music Intrinsically with LLM}, author={Ruibin Yuan and Hanfeng Lin and Yi Wang and Zeyue Tian and Shangda Wu and Tianhao Shen and Ge Zhang and Yuhang Wu and Cong Liu and Ziya Zhou and Ziyang Ma and Liumeng Xue and Ziyu Wang and Qin Liu and Tianyu Zheng and Yizhi Li and Yinghao Ma and Yiming Liang and Xiaowei Chi and Ruibo Liu and Zili Wang and Pengfei Li and Jingcheng Wu and Chenghua Lin and Qifeng Liu and Tao Jiang and Wenhao Huang and Wenhu Chen and Emmanouil Benetos and Jie Fu and Gus Xia and Roger Dannenberg and Wei Xue and Shiyin Kang and Yike Guo}, year={2024}, eprint={2402.16153}, archivePrefix={arXiv}, primaryClass={cs.SD} } ``` ## Dataset Card Contact Authors of ChatMusician.

[**🌐 演示页面**](https://ezmonyi.github.io/ChatMusician/) | [**🤗 预训练数据集**](https://huggingface.co/datasets/m-a-p/MusicPile) | [**🤗 基准测试集**](https://huggingface.co/datasets/m-a-p/MusicTheoryBench) | [**📖 arXiv预印本**](http://arxiv.org/abs/2402.16153) | [💻 **代码仓库**](https://github.com/hf-lin/ChatMusician) | [**🤖 对话模型**](https://huggingface.co/m-a-p/ChatMusician) | [**🤖 基础模型**](https://huggingface.co/m-a-p/ChatMusician-Base) # MusicPile-sft 数据集卡片 *MusicPile-sft* 是 [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile) 的一个子集。该数据集包含**114万**条样本,音乐文本描述与音乐乐谱(ABC记谱法,ABC notation)的占比为2:1。以下为数据集概览: | 数据集名称 | 数据来源 | 样本数量 | 数据类别 | 数据格式 | | --- | --- | --- | --- | --- | | 爱尔兰民谣数据集(IrishMAN) | 公开数据集 + 人工编写指令 | 34万 | 音乐乐谱 | 对话格式 | | KernScores数据集 | 公开数据集 + 人工编写指令 | 1万 | 音乐乐谱 | 对话格式 | | JSB圣咏数据集(JSB Chorales) | 公开数据集 + 人工编写指令 | 3.35万 | 音乐乐谱 | 对话格式 | | 音乐知识 | 由GPT-4生成 | 25.5万 | 音乐文本描述 | 对话格式 | | 音乐摘要 | 由GPT-4生成 | 50万 | 音乐文本描述 | 对话格式 | 注:由于巴赫风格乐曲作品的公开数据量极少,JSB圣咏数据集的样本被重复了100次。 你可以通过以下代码便捷加载该数据集: python from datasets import load_dataset ds = load_dataset("m-a-p/MusicPile-sft") ## 语言说明 *MusicPile-sft* 主要包含英文文本。 ## 数据集结构 *MusicPile-sft* 包含`id`、`src`、`input`、`instruction`与`output`共5个字段。 ## 引用声明 若您的研究工作用到本数据集,请引用我们的相关成果: @misc{yuan2024chatmusician, title={ChatMusician: Understanding and Generating Music Intrinsically with LLM}, author={Ruibin Yuan and Hanfeng Lin and Yi Wang and Zeyue Tian and Shangda Wu and Tianhao Shen and Ge Zhang and Yuhang Wu and Cong Liu and Ziya Zhou and Ziyang Ma and Liumeng Xue and Ziyu Wang and Qin Liu and Tianyu Zheng and Yizhi Li and Yinghao Ma and Yiming Liang and Xiaowei Chi and Ruibo Liu and Zili Wang and Pengfei Li and Jingcheng Wu and Chenghua Lin and Qifeng Liu and Tao Jiang and Wenhao Huang and Wenhu Chen and Emmanouil Benetos and Jie Fu and Gus Xia and Roger Dannenberg and Wei Xue and Shiyin Kang and Yike Guo}, year={2024}, eprint={2402.16153}, archivePrefix={arXiv}, primaryClass={cs.SD} } ## 数据集卡片维护方 ChatMusician 项目作者。
提供机构:
maas
创建时间:
2024-04-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作