StepEval-Audio-360

Name: StepEval-Audio-360
Creator: maas
Published: 2026-05-11 19:09:12
License: 暂无描述

魔搭社区2026-05-11 更新2025-02-22 收录

下载链接：

https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360

下载链接

链接失效反馈

官方服务：

资源简介：

# StepEval-Audio-360 ## Dataset Description StepEval Audio 360 is a comprehensive dataset that evaluates the ability of multi-modal large language models (MLLMs) in human-AI audio interaction. This audio benchmark dataset, sourced from professional human annotators, covers a full spectrum of capabilities: singing, creativity, role-playing, logical reasoning, voice understanding, voice instruction following, gaming, speech emotion control, and language ability. ## Languages StepEval Audio 360 comprises about human voice recorded in different languages and dialects, including Chinese(Szechuan dialect and cantonese), English, and Japanese. It contains both audio and transcription data. ## Links - Homepage: [Step-Audio](https://github.com/stepfun-ai/Step-Audio) - Paper: [Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction ](https://arxiv.org/abs/2502.11946) - ModelScope: https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360 - Step-Audio Model Suite： - Step-Audio-Tokenizer： - Hugging Face：https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer - ModelScope：https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer - Step-Audio-Chat : - HuggingFace: https://huggingface.co/stepfun-ai/Step-Audio-Chat - ModelScope: https://modelscope.cn/models/stepfun-ai/Step-Audio-Chat - Step-Audio-TTS-3B： - Hugging Face: https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B - ModelScope: https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B ## User Manual * Download the dataset ``` # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 cd StepEval-Audio-360 git lfs pull ``` * Decompress audio data ``` mkdir audios tar -xvf audios.tar.gz -C audios ``` * How to use ``` from datasets import load_dataset dataset = load_dataset("stepfun-ai/StepEval-Audio-360") dataset = dataset["test"] for item in dataset: conversation_id = item["conversation_id"] category = item["category"] conversation = item["conversation"] # parse multi-turn dialogue data for turn in conversation: role = turn["role"] text = turn["text"] audio_filename = turn["audio_filename"] # refer to decompressed audio file if audio_filename is not None: print(role, text, audio_filename) else: print(role, text) ``` ## Licensing This dataset project is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Citation If you utilize this dataset, please cite it using the BibTeX provided. ``` @misc {stepfun_2025, author = { {StepFun} }, title = { StepEval-Audio-360 (Revision 72a072e) }, year = 2025, url = { https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 }, doi = { 10.57967/hf/4528 }, publisher = { Hugging Face } } ```

# StepEval-Audio-360 ## 数据集说明 StepEval-Audio-360是一款综合性基准数据集，用于评估多模态大语言模型（Multi-Modal Large Language Models, MLLMs）在人机音频交互场景中的能力。该数据集由专业人工标注者构建，覆盖全维度能力评估范畴：包括歌唱、创意生成、角色扮演、逻辑推理、语音理解、语音指令遵循、游戏交互、语音情感调控以及语言能力等多个方向。 ## 语言覆盖范围 StepEval-Audio-360收录了多语言及方言录制的人类语音数据，涵盖中文（四川方言与粤语）、英语以及日语。数据集同时包含音频文件与转录文本两类数据。 ## 相关链接 - 主页：[Step-Audio](https://github.com/stepfun-ai/Step-Audio) - 论文：[Step-Audio：智能语音交互中的统一理解与生成](https://arxiv.org/abs/2502.11946) - ModelScope：https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360 - Step-Audio 模型套件： - Step-Audio 分词器（Step-Audio-Tokenizer）： - Hugging Face：https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer - ModelScope：https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer - Step-Audio 对话模型（Step-Audio-Chat）： - Hugging Face：https://huggingface.co/stepfun-ai/Step-Audio-Chat - ModelScope：https://modelscope.cn/models/stepfun-ai/Step-Audio-Chat - Step-Audio 文本转语音3B模型（Step-Audio-TTS-3B）： - Hugging Face：https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B - ModelScope：https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B ## 用户手册 * 数据集下载 # 请确保已安装git-lfs（https://git-lfs.com） git lfs install git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 cd StepEval-Audio-360 git lfs pull * 音频数据解压 mkdir audios tar -xvf audios.tar.gz -C audios * 使用方法 from datasets import load_dataset dataset = load_dataset("stepfun-ai/StepEval-Audio-360") dataset = dataset["test"] for item in dataset: conversation_id = item["conversation_id"] category = item["category"] conversation = item["conversation"] # 解析多轮对话数据 for turn in conversation: role = turn["role"] text = turn["text"] audio_filename = turn["audio_filename"] # 指向已解压的音频文件 if audio_filename is not None: print(role, text, audio_filename) else: print(role, text) ## 授权协议本数据集项目采用[Apache 2.0开源许可协议](https://www.apache.org/licenses/LICENSE-2.0)进行授权。 ## 引用方式若您使用本数据集，请通过以下BibTeX格式进行引用： @misc {stepfun_2025, author = { {StepFun} }, title = { StepEval-Audio-360 (Revision 72a072e) }, year = 2025, url = { https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 }, doi = { 10.57967/hf/4528 }, publisher = { Hugging Face } }

提供机构：

maas

创建时间：

2025-02-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集