SMuPT_v0_8192_770M

Name: SMuPT_v0_8192_770M
Creator: maas
Published: 2025-12-05 16:14:41
License: 暂无描述

魔搭社区2025-12-05 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/m-a-p/SMuPT_v0_8192_770M

下载链接

链接失效反馈

官方服务：

资源简介：

<div align="center"> <img src="Yi_logo.svg" width="150px" style="display: inline-block;"> <img src="m-a-p.png" width="150px" style="display: inline-block;"> </div> ## SMuPT: Symbolic Music Generative Pre-trained Transformer SMuPT is a series of pre-trained models for symbolic music generation. It was trained on a large-scale dataset of symbolic music, including millions of monophonic and polyphonic pieces from different genres and styles. The models are trained with the LLama2 architecture, and can be further used for downstream music generation tasks such as melody generation, accompaniment generation, and multi-track music generation. - 09/01/2024: a series of pre-trained SMuPT models are released, with parameters ranging from 110M to 1.3B. ## Model architecture The details of model architecture of SMuPT-v0 are listed below: | Name | Parameters | Training Data(Music Pieces) | Seq Length | Hidden Size | Layers | Heads | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | SMuPT-v0-8192-110M | 110M | 7M x 5.8 epochs | 8192 | 768 | 12 | 12 | | SMuPT-v0-8192-345M | 345M | 7M x 4 epochs | 8192 | 1024 | 24 | 16 | | SMuPT-v0-8192-770M | 770M | 7M x 3 epochs | 8192 | 1280 | 36 | 20 | | SMuPT-v0-8192-1.3B | 1.3B | 7M x 2.2 epochs | 8192 | 1536 | 48 | 24 | ## Model Usage There are several ways to use our pre-trained SMuPT models, we now the usage based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main). Huggingface format will be supported soon. Before starting, make sure you have setup the relevant environment and codebase. ```shell # pull Megatron-LM codebase mkdir -p /path/to/workspace && cd /path/to/workspace git clone https://github.com/NVIDIA/Megatron-LM.git # download the pre-trained SMuPT models checkpoint and vocab files from Huggingface page mkdir -p /models/SMuPT_v0_8192_1.3B && cd /models/SMuPT_v0_8192_1.3B wget -O model_optim_rng.pt https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/model_optim_rng.pt?download=true wget -O newline.vocab https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/newline.vocab?download=true wget -O newline.txt https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/newline.txt?download=true ``` We recommend using the latest version of [NGC's PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for SMuPT inference. See more details in [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main) ```shell # pull the latest NGC's PyTorch container, mount the workspace directory and enter the container docker run --gpus all -it --name megatron --shm-size=16g -v $PWD:/workspace -p 5000:5000 nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash ``` Once you enter the container, you can start a REST server for inference. <details> <summary>Click to expand the example script</summary> #!/bin/bash # This example will start serving the 1.3B model. export CUDA_DEVICE_MAX_CONNECTIONS=1 DISTRIBUTED_ARGS="--nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr localhost \ --master_port 6000" CHECKPOINT=/path/to/model/checkpoint/folder VOCAB_FILE=/path/to/vocab/file MERGE_FILE=/path/to/merge/file MODEL_SIZE="1.3B" if [[ ${MODEL_SIZE} == "110M" ]]; then HIDDEN_SIZE=768; NUM_HEAD=12; NUM_QUERY_GROUP=12; NUM_LAYERS=12; FFN_HIDDEN_SIZE=3072; NORM_EPS=1e-5; elif [[ ${MODEL_SIZE} == "345M" ]]; then HIDDEN_SIZE=1024; NUM_HEAD=16; NUM_QUERY_GROUP=16; NUM_LAYERS=24; FFN_HIDDEN_SIZE=4096; NORM_EPS=1e-5; elif [[ ${MODEL_SIZE} == "770M" ]]; then HIDDEN_SIZE=1280; NUM_HEAD=20; NUM_QUERY_GROUP=20; NUM_LAYERS=36; FFN_HIDDEN_SIZE=5120; NORM_EPS=1e-5; elif [[ ${MODEL_SIZE} == "1.3B" ]]; then HIDDEN_SIZE=1536; NUM_HEAD=24; NUM_QUERY_GROUP=24; NUM_LAYERS=48; FFN_HIDDEN_SIZE=6144; NORM_EPS=1e-5; else echo "invalid MODEL_SIZE: ${MODEL_SIZE}"; exit 1 fi MAX_SEQ_LEN=8192 MAX_POSITION_EMBEDDINGS=8192 pip install flask-restful torchrun $DISTRIBUTED_ARGS tools/run_text_generation_server.py \ --tensor-model-parallel-size 1 \ --pipeline-model-parallel-size 1 \ --num-layers ${NUM_LAYERS} \ --hidden-size ${HIDDEN_SIZE} \ --ffn-hidden-size ${FFN_HIDDEN_SIZE} \ --load ${CHECKPOINT} \ --group-query-attention \ --num-query-groups ${NUM_QUERY_GROUP} \ --position-embedding-type rope \ --num-attention-heads ${NUM_HEAD} \ --max-position-embeddings ${MAX_POSITION_EMBEDDINGS} \ --tokenizer-type GPT2BPETokenizer \ --normalization RMSNorm \ --norm-epsilon ${NORM_EPS} \ --make-vocab-size-divisible-by 1 \ --swiglu \ --use-flash-attn \ --bf16 \ --micro-batch-size 1 \ --disable-bias-linear \ --no-bias-gelu-fusion \ --untie-embeddings-and-output-weights \ --seq-length ${MAX_SEQ_LEN} \ --vocab-file $VOCAB_FILE \ --merge-file $MERGE_FILE \ --attention-dropout 0.0 \ --hidden-dropout 0.0 \ --weight-decay 1e-1 \ --clip-grad 1.0 \ --adam-beta1 0.9 \ --adam-beta2 0.95 \ --adam-eps 1e-8 \ --seed 42 </details> Use CURL to query the server directly, note that the newline token `\n` is represented by `<n>` in the vocabulary, so we need to replace the newline token with `<n>` in both the prompt and the generated tokens. ```shell curl 'http://localhost:6000/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8' -d '{"prompts":["X:1<n>L:1/8<n>M:4/4<n>K:G<n>GA"], "tokens_to_generate":4096}' ``` Processed Output: ```shell X:1 L:1/8 M:4/4 K:G GA | B2 B2 B2 (cd) | B2 A2 z2 AB | c2 c2 c2 (de) | d4 z2 B2 | d2 d2 d2 e>d | c2 B2 z2 dB | A2 A2 A2 B2 | G4 z2 GA | B2 B2 B2 cd | B2 A2 z2 AB | c2 c2 e2 dc | d4 z2 GA | B2 B2 B2 cd | B2 A2 z2 dB | A3 G A2 B2 | G4 z2 |] ``` Once you encode the generated tokens into audio, you will hear the following music. <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/640701cb4dc5f2846c91d4eb/Ows-HvaSuZfqAZvOjT4LX.mpga"></audio>

<div align="center"> <img src="Yi_logo.svg" width="150px" style="display: inline-block;"> <img src="m-a-p.png" width="150px" style="display: inline-block;"> </div> ## SMuPT：符号音乐生成式预训练Transformer（Symbolic Music Generative Pre-trained Transformer） SMuPT是一系列面向符号音乐生成的预训练模型。其训练数据源自大规模符号音乐数据集，涵盖来自不同流派与风格的数百万首单声部及多声部音乐作品。该系列模型基于LLaMA2架构进行训练，可进一步适配各类下游音乐生成任务，例如旋律生成、伴奏生成以及多轨音乐生成。 - 2024年9月1日：发布一系列SMuPT预训练模型，参数量覆盖1.1亿至13亿区间。 ## 模型架构 SMuPT-v0的模型架构细节如下表所示： | 模型名称 | 参数量 | 训练数据（音乐作品） | 序列长度 | 隐藏层维度 | 层数 | 注意力头数 | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | SMuPT-v0-8192-110M | 1.1亿 | 700万首 × 5.8轮 | 8192 | 768 | 12 | 12 | | SMuPT-v0-8192-345M | 3.45亿 | 700万首 × 4轮 | 8192 | 1024 | 24 | 16 | | SMuPT-v0-8192-770M | 7.7亿 | 700万首 × 3轮 | 8192 | 1280 | 36 | 20 | | SMuPT-v0-8192-1.3B | 13亿 | 700万首 × 2.2轮 | 8192 | 1536 | 48 | 24 | ## 模型使用方法本项目提供基于[Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main)的使用方式，后续将支持Huggingface格式的调用。在开始使用前，请确保已配置好相关环境与代码库。 shell # 拉取Megatron-LM代码库 mkdir -p /path/to/workspace && cd /path/to/workspace git clone https://github.com/NVIDIA/Megatron-LM.git # 从Huggingface页面下载预训练SMuPT模型检查点与词汇文件 mkdir -p /models/SMuPT_v0_8192_1.3B && cd /models/SMuPT_v0_8192_1.3B wget -O model_optim_rng.pt https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/model_optim_rng.pt?download=true wget -O newline.vocab https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/newline.vocab?download=true wget -O newline.txt https://huggingface.co/m-a-p/SMuPT_v0_8192_1.3B/resolve/main/newline.txt?download=true 我们推荐使用最新版本的[NGC PyTorch容器](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)来进行SMuPT的推理任务，更多细节可参考[Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main)。 shell # 拉取最新版NGC PyTorch容器，挂载工作区目录并进入容器 docker run --gpus all -it --name megatron --shm-size=16g -v $PWD:/workspace -p 5000:5000 nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash 进入容器后，即可启动REST服务以进行推理。 <details> <summary>点击展开示例脚本</summary> #!/bin/bash # 本示例将启动13亿参数量模型的服务 export CUDA_DEVICE_MAX_CONNECTIONS=1 DISTRIBUTED_ARGS="--nproc_per_node 1 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 6000" CHECKPOINT=/path/to/model/checkpoint/folder VOCAB_FILE=/path/to/vocab/file MERGE_FILE=/path/to/merge/file MODEL_SIZE="1.3B" if [[ ${MODEL_SIZE} == "110M" ]]; then HIDDEN_SIZE=768; NUM_HEAD=12; NUM_QUERY_GROUP=12; NUM_LAYERS=12; FFN_HIDDEN_SIZE=3072; NORM_EPS=1e-5; elif [[ ${MODEL_SIZE} == "345M" ]]; then HIDDEN_SIZE=1024; NUM_HEAD=16; NUM_QUERY_GROUP=16; NUM_LAYERS=24; FFN_HIDDEN_SIZE=4096; NORM_EPS=1e-5; elif [[ ${MODEL_SIZE} == "770M" ]]; then HIDDEN_SIZE=1280; NUM_HEAD=20; NUM_QUERY_GROUP=20; NUM_LAYERS=36; FFN_HIDDEN_SIZE=5120; NORM_EPS=1e-5; elif [[ ${MODEL_SIZE} == "1.3B" ]]; then HIDDEN_SIZE=1536; NUM_HEAD=24; NUM_QUERY_GROUP=24; NUM_LAYERS=48; FFN_HIDDEN_SIZE=6144; NORM_EPS=1e-5; else echo "invalid MODEL_SIZE: ${MODEL_SIZE}"; exit 1 fi MAX_SEQ_LEN=8192 MAX_POSITION_EMBEDDINGS=8192 pip install flask-restful torchrun $DISTRIBUTED_ARGS tools/run_text_generation_server.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers ${NUM_LAYERS} --hidden-size ${HIDDEN_SIZE} --ffn-hidden-size ${FFN_HIDDEN_SIZE} --load ${CHECKPOINT} --group-query-attention --num-query-groups ${NUM_QUERY_GROUP} --position-embedding-type rope --num-attention-heads ${NUM_HEAD} --max-position-embeddings ${MAX_POSITION_EMBEDDINGS} --tokenizer-type GPT2BPETokenizer --normalization RMSNorm --norm-epsilon ${NORM_EPS} --make-vocab-size-divisible-by 1 --swiglu --use-flash-attn --bf16 --micro-batch-size 1 --disable-bias-linear --no-bias-gelu-fusion --untie-embeddings-and-output-weights --seq-length ${MAX_SEQ_LEN} --vocab-file $VOCAB_FILE --merge-file $MERGE_FILE --attention-dropout 0.0 --hidden-dropout 0.0 --weight-decay 1e-1 --clip-grad 1.0 --adam-beta1 0.9 --adam-beta2 0.95 --adam-eps 1e-8 --seed 42 </details> 可直接使用CURL向服务端发起请求，请注意：词汇表中换行符` `用`<n>`表示，因此在提示词与生成结果中均需将换行符替换为`<n>`。 shell curl 'http://localhost:6000/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8' -d '{"prompts":["X:1<n>L:1/8<n>M:4/4<n>K:G<n>GA"], "tokens_to_generate":4096}' ### 处理后的输出 shell X:1 L:1/8 M:4/4 K:G GA | B2 B2 B2 (cd) | B2 A2 z2 AB | c2 c2 c2 (de) | d4 z2 B2 | d2 d2 d2 e>d | c2 B2 z2 dB | A2 A2 A2 B2 | G4 z2 GA | B2 B2 B2 cd | B2 A2 z2 AB | c2 c2 e2 dc | d4 z2 GA | B2 B2 B2 cd | B2 A2 z2 dB | A3 G A2 B2 | G4 z2 |] 将生成的Token编码为音频后，即可收听如下音乐。 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/640701cb4dc5f2846c91d4eb/Ows-HvaSuZfqAZvOjT4LX.mpga"></audio>

提供机构：

maas

创建时间：

2024-04-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集