final-audio-3
收藏魔搭社区2025-10-22 更新2025-02-22 收录
下载链接:
https://modelscope.cn/datasets/arvin1988/final-audio-3
下载链接
链接失效反馈官方服务:
资源简介:
# StepEval-Audio-360
## Introduction
We release StepEval Audio 360 as a new benchmark, which consists of 137 multi-turn Chinese prompts sourced from real users and is designed to evaluate the quality of generated response across the following dimensions: linguistic competence, emotional intelligence, logical reasoning, creative generation, commonsense knowledge, complex instruction adherence, role-playing capabilities, gaming interactions, real-world task handling, memory retention, safety compliance, educational tutoring, and comprehensive understanding. Each set of multi-turn dialogue data follows this format:
```
{
"conversation_id": 7,
"category": "Language Ability",
"conversation": [
{
"role": "user",
"text": "用四川话介绍一下。",
"audio_filename": "conversation_id_7_turn_1.wav"
},
{
"role": "assistant",
"text": "(四川话)四川,简称“川”或“蜀”,省会是成都,位于中国西南部,地处长江上游,以山地、丘陵和平原为主,地形多样。四川是中国重要的经济、文化、科教和交通中心之一,也是西南地区重要的经济中心和全国重要的增长极。四川历史悠久"
},
{
"role": "user",
"text": "莎士比亚。",
"audio_filename": "conversation_id_7_turn_2.wav"
}
]
}
```
## User Manual
* Download the dataset
```
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360
cd StepEval-Audio-360
git lfs pull
```
* Decompress audio data
```
mkdir audios
tar -xvf audios.tar.gz -C audios
```
* How to use
```
from datasets import load_dataset
dataset = load_dataset("stepfun-ai/StepEval-Audio-360")
dataset = dataset["test"]
for item in dataset:
conversation_id = item["conversation_id"]
category = item["category"]
conversation = item["conversation"]
# parse multi-turn dialogue data
for turn in conversation:
role = turn["role"]
text = turn["text"]
audio_filename = turn["audio_filename"] # refer to decompressed audio file
if audio_filename is not None:
print(role, text, audio_filename)
else:
print(role, text)
```
# StepEval-Audio-360
## 简介
我们发布StepEval-Audio-360作为全新基准测试集,该数据集包含137条源自真实用户的多轮中文提示词,旨在从以下维度评估生成式回复的质量:语言能力、情感智能、逻辑推理、创意生成、常识知识、复杂指令遵循能力、角色扮演能力、游戏交互能力、现实任务处理能力、记忆留存能力、安全合规性、教育辅导能力以及综合理解能力。每组多轮对话数据均遵循如下格式:
json
{
"conversation_id": 7,
"category": "Language Ability",
"conversation": [
{
"role": "user",
"text": "用四川话介绍一下。",
"audio_filename": "conversation_id_7_turn_1.wav"
},
{
"role": "assistant",
"text": "(四川话)四川,简称“川”或“蜀”,省会是成都,位于中国西南部,地处长江上游,以山地、丘陵和平原为主,地形多样。四川是中国重要的经济、文化、科教和交通中心之一,也是西南地区重要的经济中心和全国重要的增长极。四川历史悠久"
},
{
"role": "user",
"text": "莎士比亚。",
"audio_filename": "conversation_id_7_turn_2.wav"
}
]
}
## 用户手册
* 数据集下载
# 请确保已安装git-lfs(下载地址:https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/stepfun-ai/StepEval-Audio-360
cd StepEval-Audio-360
git lfs pull
* 音频数据解压
mkdir audios
tar -xvf audios.tar.gz -C audios
* 使用方法
python
from datasets import load_dataset
dataset = load_dataset("stepfun-ai/StepEval-Audio-360")
dataset = dataset["test"]
for item in dataset:
conversation_id = item["conversation_id"]
category = item["category"]
conversation = item["conversation"]
# 解析多轮对话数据
for turn in conversation:
role = turn["role"]
text = turn["text"]
audio_filename = turn["audio_filename"] # 对应已解压的音频文件
if audio_filename is not None:
print(role, text, audio_filename)
else:
print(role, text)
提供机构:
maas
创建时间:
2025-02-16



