StepEval-Audio-360
收藏魔搭社区2026-05-11 更新2025-02-22 收录
下载链接:
https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360
下载链接
链接失效反馈官方服务:
资源简介:
# StepEval-Audio-360
## Dataset Description
StepEval Audio 360 is a comprehensive dataset that evaluates the ability of multi-modal large language models (MLLMs) in human-AI audio interaction. This audio benchmark dataset, sourced from professional human annotators, covers a full spectrum of capabilities: singing, creativity, role-playing, logical reasoning, voice understanding, voice instruction following, gaming, speech emotion control, and language ability.
## Languages
StepEval Audio 360 comprises about human voice recorded in different languages and dialects, including Chinese(Szechuan dialect and cantonese), English, and Japanese. It contains both audio and transcription data.
## Links
- Homepage: [Step-Audio](https://github.com/stepfun-ai/Step-Audio)
- Paper: [Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
](https://arxiv.org/abs/2502.11946)
- ModelScope: https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360
- Step-Audio Model Suite:
- Step-Audio-Tokenizer:
- Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer
- ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer
- Step-Audio-Chat :
- HuggingFace: https://huggingface.co/stepfun-ai/Step-Audio-Chat
- ModelScope: https://modelscope.cn/models/stepfun-ai/Step-Audio-Chat
- Step-Audio-TTS-3B:
- Hugging Face: https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B
- ModelScope: https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B
## User Manual
* Download the dataset
```
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360
cd StepEval-Audio-360
git lfs pull
```
* Decompress audio data
```
mkdir audios
tar -xvf audios.tar.gz -C audios
```
* How to use
```
from datasets import load_dataset
dataset = load_dataset("stepfun-ai/StepEval-Audio-360")
dataset = dataset["test"]
for item in dataset:
conversation_id = item["conversation_id"]
category = item["category"]
conversation = item["conversation"]
# parse multi-turn dialogue data
for turn in conversation:
role = turn["role"]
text = turn["text"]
audio_filename = turn["audio_filename"] # refer to decompressed audio file
if audio_filename is not None:
print(role, text, audio_filename)
else:
print(role, text)
```
## Licensing
This dataset project is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
## Citation
If you utilize this dataset, please cite it using the BibTeX provided.
```
@misc {stepfun_2025,
author = { {StepFun} },
title = { StepEval-Audio-360 (Revision 72a072e) },
year = 2025,
url = { https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 },
doi = { 10.57967/hf/4528 },
publisher = { Hugging Face }
}
```
# StepEval-Audio-360
## 数据集说明
StepEval-Audio-360是一款综合性基准数据集,用于评估多模态大语言模型(Multi-Modal Large Language Models, MLLMs)在人机音频交互场景中的能力。该数据集由专业人工标注者构建,覆盖全维度能力评估范畴:包括歌唱、创意生成、角色扮演、逻辑推理、语音理解、语音指令遵循、游戏交互、语音情感调控以及语言能力等多个方向。
## 语言覆盖范围
StepEval-Audio-360收录了多语言及方言录制的人类语音数据,涵盖中文(四川方言与粤语)、英语以及日语。数据集同时包含音频文件与转录文本两类数据。
## 相关链接
- 主页:[Step-Audio](https://github.com/stepfun-ai/Step-Audio)
- 论文:[Step-Audio:智能语音交互中的统一理解与生成](https://arxiv.org/abs/2502.11946)
- ModelScope:https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360
- Step-Audio 模型套件:
- Step-Audio 分词器(Step-Audio-Tokenizer):
- Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer
- ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer
- Step-Audio 对话模型(Step-Audio-Chat):
- Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-Chat
- ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-Chat
- Step-Audio 文本转语音3B模型(Step-Audio-TTS-3B):
- Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B
- ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B
## 用户手册
* 数据集下载
# 请确保已安装git-lfs(https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360
cd StepEval-Audio-360
git lfs pull
* 音频数据解压
mkdir audios
tar -xvf audios.tar.gz -C audios
* 使用方法
from datasets import load_dataset
dataset = load_dataset("stepfun-ai/StepEval-Audio-360")
dataset = dataset["test"]
for item in dataset:
conversation_id = item["conversation_id"]
category = item["category"]
conversation = item["conversation"]
# 解析多轮对话数据
for turn in conversation:
role = turn["role"]
text = turn["text"]
audio_filename = turn["audio_filename"] # 指向已解压的音频文件
if audio_filename is not None:
print(role, text, audio_filename)
else:
print(role, text)
## 授权协议
本数据集项目采用[Apache 2.0开源许可协议](https://www.apache.org/licenses/LICENSE-2.0)进行授权。
## 引用方式
若您使用本数据集,请通过以下BibTeX格式进行引用:
@misc {stepfun_2025,
author = { {StepFun} },
title = { StepEval-Audio-360 (Revision 72a072e) },
year = 2025,
url = { https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 },
doi = { 10.57967/hf/4528 },
publisher = { Hugging Face }
}
提供机构:
maas
创建时间:
2025-02-16



