five

StepEval-Audio-360

收藏
魔搭社区2026-05-11 更新2025-02-22 收录
下载链接:
https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360
下载链接
链接失效反馈
官方服务:
资源简介:
# StepEval-Audio-360 ## Dataset Description StepEval Audio 360 is a comprehensive dataset that evaluates the ability of multi-modal large language models (MLLMs) in human-AI audio interaction. This audio benchmark dataset, sourced from professional human annotators, covers a full spectrum of capabilities: singing, creativity, role-playing, logical reasoning, voice understanding, voice instruction following, gaming, speech emotion control, and language ability. ## Languages StepEval Audio 360 comprises about human voice recorded in different languages and dialects, including Chinese(Szechuan dialect and cantonese), English, and Japanese. It contains both audio and transcription data. ## Links - Homepage: [Step-Audio](https://github.com/stepfun-ai/Step-Audio) - Paper: [Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction ](https://arxiv.org/abs/2502.11946) - ModelScope: https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360 - Step-Audio Model Suite: - Step-Audio-Tokenizer: - Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer - ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer - Step-Audio-Chat : - HuggingFace: https://huggingface.co/stepfun-ai/Step-Audio-Chat - ModelScope: https://modelscope.cn/models/stepfun-ai/Step-Audio-Chat - Step-Audio-TTS-3B: - Hugging Face: https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B - ModelScope: https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B ## User Manual * Download the dataset ``` # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 cd StepEval-Audio-360 git lfs pull ``` * Decompress audio data ``` mkdir audios tar -xvf audios.tar.gz -C audios ``` * How to use ``` from datasets import load_dataset dataset = load_dataset("stepfun-ai/StepEval-Audio-360") dataset = dataset["test"] for item in dataset: conversation_id = item["conversation_id"] category = item["category"] conversation = item["conversation"] # parse multi-turn dialogue data for turn in conversation: role = turn["role"] text = turn["text"] audio_filename = turn["audio_filename"] # refer to decompressed audio file if audio_filename is not None: print(role, text, audio_filename) else: print(role, text) ``` ## Licensing This dataset project is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Citation If you utilize this dataset, please cite it using the BibTeX provided. ``` @misc {stepfun_2025, author = { {StepFun} }, title = { StepEval-Audio-360 (Revision 72a072e) }, year = 2025, url = { https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 }, doi = { 10.57967/hf/4528 }, publisher = { Hugging Face } } ```

# StepEval-Audio-360 ## 数据集说明 StepEval-Audio-360是一款综合性基准数据集,用于评估多模态大语言模型(Multi-Modal Large Language Models, MLLMs)在人机音频交互场景中的能力。该数据集由专业人工标注者构建,覆盖全维度能力评估范畴:包括歌唱、创意生成、角色扮演、逻辑推理、语音理解、语音指令遵循、游戏交互、语音情感调控以及语言能力等多个方向。 ## 语言覆盖范围 StepEval-Audio-360收录了多语言及方言录制的人类语音数据,涵盖中文(四川方言与粤语)、英语以及日语。数据集同时包含音频文件与转录文本两类数据。 ## 相关链接 - 主页:[Step-Audio](https://github.com/stepfun-ai/Step-Audio) - 论文:[Step-Audio:智能语音交互中的统一理解与生成](https://arxiv.org/abs/2502.11946) - ModelScope:https://modelscope.cn/datasets/stepfun-ai/StepEval-Audio-360 - Step-Audio 模型套件: - Step-Audio 分词器(Step-Audio-Tokenizer): - Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer - ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer - Step-Audio 对话模型(Step-Audio-Chat): - Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-Chat - ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-Chat - Step-Audio 文本转语音3B模型(Step-Audio-TTS-3B): - Hugging Face:https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B - ModelScope:https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B ## 用户手册 * 数据集下载 # 请确保已安装git-lfs(https://git-lfs.com) git lfs install git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 cd StepEval-Audio-360 git lfs pull * 音频数据解压 mkdir audios tar -xvf audios.tar.gz -C audios * 使用方法 from datasets import load_dataset dataset = load_dataset("stepfun-ai/StepEval-Audio-360") dataset = dataset["test"] for item in dataset: conversation_id = item["conversation_id"] category = item["category"] conversation = item["conversation"] # 解析多轮对话数据 for turn in conversation: role = turn["role"] text = turn["text"] audio_filename = turn["audio_filename"] # 指向已解压的音频文件 if audio_filename is not None: print(role, text, audio_filename) else: print(role, text) ## 授权协议 本数据集项目采用[Apache 2.0开源许可协议](https://www.apache.org/licenses/LICENSE-2.0)进行授权。 ## 引用方式 若您使用本数据集,请通过以下BibTeX格式进行引用: @misc {stepfun_2025, author = { {StepFun} }, title = { StepEval-Audio-360 (Revision 72a072e) }, year = 2025, url = { https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360 }, doi = { 10.57967/hf/4528 }, publisher = { Hugging Face } }
提供机构:
maas
创建时间:
2025-02-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作