S2S-Arena

Name: S2S-Arena
Creator: maas
Published: 2025-12-05 16:26:19
License: 暂无描述

魔搭社区2025-12-05 更新2025-03-15 收录

下载链接：

https://modelscope.cn/datasets/FreedomIntelligence/S2S-Arena

下载链接

链接失效反馈

官方服务：

资源简介：

# S2S-Arena Dataset This repository hosts the **S2S-Arena** dataset. It covers four practical domains with 21 tasks, includes 154 instructions of varying difficulty levels, and features a mix of samples from TTS synthesis, human recordings, and existing audio datasets. [Project Page](https://huggingface.co/spaces/FreedomIntelligence/S2S-Arena) ## Introduction ### GitHub Repository For more information and access to the dataset, please visit the GitHub repository: [S2S-Arena on GitHub](https://github.com/FreedomIntelligence/S2S-Arena) ### Related Publication For detailed insights into the dataset’s construction, methodology, and applications, please refer to the accompanying academic publication: [S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information](https://huggingface.co/papers/2503.05085) ## Data Description The dataset includes labeled audio files, textual emotion annotations, language translations, and task-specific metadata, supporting fine-grained analysis and application in machine learning. Each entry follows this format: ```json { "id": "emotion_audio_0", "input_path": "./emotion/audio_0.wav", "text": "[emotion: happy]Kids are talking by the door", "task": "Emotion recognition and expression", "task_description": "Can the model recognize emotions and provide appropriate responses based on different emotions?", "text_cn": "孩子们在门旁说话", "language": "English", "category": "Social Companionship", "level": "L3" } ``` 1. id: Unique identifier for each sample 2. input_path: Path to the audio file 3. text: English text with emotion annotation 4. task: Primary task associated with the data 5. task_description: Task description for model interpretability 6. text_cn: Chinese translation of the English text 7. language: Language of the input 8. category: Interaction context category 9. level: Difficulty or complexity level of the sample "Some data also includes a `noise` attribute, indicating that noise has been added to the current sample and specifying the type of noise." ## BIb ``` @misc{jiang2025s2sarenaevaluatingspeech2speechprotocols, title={S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information}, author={Feng Jiang and Zhiyu Lin and Fan Bu and Yuhao Du and Benyou Wang and Haizhou Li}, year={2025}, eprint={2503.05085}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.05085}, } ```

# S2S-Arena 数据集本仓库托管**S2S-Arena**数据集。该数据集涵盖4个实用领域，包含21项任务与154条难度各异的指令，样本来源涵盖文本到语音（Text-to-Speech, TTS）合成音频、人类录制音频以及现有公开音频数据集。 [项目主页](https://huggingface.co/spaces/FreedomIntelligence/S2S-Arena) ## 引言 ### GitHub 仓库如需获取更多数据集相关信息与下载权限，请访问其GitHub仓库：[S2S-Arena 仓库](https://github.com/FreedomIntelligence/S2S-Arena) ### 相关学术论文如需深入了解该数据集的构建流程、技术方法与应用场景，请参阅配套学术论文：[S2S-Arena：基于副语言信息的语音到语音指令遵循协议评估](https://huggingface.co/papers/2503.05085) ## 数据集说明本数据集包含带标注的音频文件、文本情感标注、语言译文以及任务专属元数据，可支撑机器学习领域的细粒度分析与相关应用。每条数据条目遵循如下格式： json { "id": "emotion_audio_0", "input_path": "./emotion/audio_0.wav", "text": "[emotion: happy]Kids are talking by the door", "task": "情感识别与表达", "task_description": "模型能否识别情感并基于不同情感生成恰当的回应？", "text_cn": "孩子们在门旁说话", "language": "英语", "category": "社交陪伴", "level": "L3" } 1. `id`：每条样本的唯一标识符 2. `input_path`：音频文件路径 3. `text`：带情感标注的英文文本 4. `task`：该样本关联的核心任务 5. `task_description`：用于提升模型可解释性的任务说明 6. `text_cn`：英文文本的中文译文 7. `language`：输入音频的语言 8. `category`：交互上下文类别 9. `level`：样本的难度/复杂度等级部分数据还包含`noise`（噪声）属性，用于标注当前样本已添加的噪声类型。 ## BibTeX 参考文献 @misc{jiang2025s2sarenaevaluatingspeech2speechprotocols, title={S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information}, author={Feng Jiang and Zhiyu Lin and Fan Bu and Yuhao Du and Benyou Wang and Haizhou Li}, year={2025}, eprint={2503.05085}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.05085}, }

提供机构：

maas

创建时间：

2025-03-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集