CSEMOTIONS

Name: CSEMOTIONS
Creator: maas
Published: 2026-05-15 17:53:42
License: 暂无描述

魔搭社区2026-05-15 更新2025-11-08 收录

下载链接：

https://modelscope.cn/datasets/AIDC-AI/CSEMOTIONS

下载链接

链接失效反馈

官方服务：

资源简介：

# CSEMOTIONS: High-Quality Mandarin Emotional Speech Dataset [Paper](https://huggingface.co/papers/2508.02038) | [Code](https://github.com/AIDC-AI/Marco-Voice) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) **CSEMOTIONS** is a high-quality Mandarin emotional speech dataset designed for expressive speech synthesis, emotion recognition, and voice cloning research. The dataset contains studio-quality recordings from six professional voice actors across seven carefully curated emotional categories, supporting research in controllable and natural language speech generation. ## Dataset Summary - **Name:** CSEMOTIONS - **Total Duration:** ~10 hours - **Speakers:** 10 (5 male, 5 female) native Mandarin speakers, all professional voice actors - **Emotions:** Neutral, Happy, Angry, Sad, Surprise, Playfulness, Fearful - **Language:** Mandarin Chinese - **Sampling Rate:** 48kHz, 24-bit PCM - **Recording Setting:** Professional studio environment - **Evaluation Prompts:** 100 per emotion, in both English and Chinese ## Dataset Structure Each data sample includes: - **audio**: The speech waveform (48kHz, 24-bit, WAV) - **transcript**: The transcribed sentence in Mandarin - **emotion**: One of {Neutral, Happy, Angry, Sad, Surprise, Playfulness, Fearful} - **speaker_id**: An anonymized speaker identifier (e.g., `S01`) - **gender**: Male/Female - **prompt_id**: Unique identifier for each utterance ## Intended Uses CSEMOTIONS is intended for: - Expressive text-to-speech (TTS) and voice cloning systems - Speech emotion recognition (SER) research - Cross-lingual and cross-emotional synthesis experiments - Benchmarking emotion transfer or disentanglement models ## Dataset Details | Property | Value | |-------------------------|---------------------------------------| | Total audio hours | ~10 | | Number of speakers | 6 (3♂, 3♀, anonymized IDs) | | Emotions | Neutral, Happy, Angry, Sad, Surprise, Playfulness, Fearful | | Language | Mandarin Chinese | | Format | WAV, mono, 48kHz/24bit | | Studio quality | Yes | | Label | Duration | Sentences | | -------- | -------- | --------- | | Sad | 1.73h | 546 | | Angry | 1.43h | 769 | | Happy | 1.51h | 603 | | Surprise | 1.25h | 508 | | Fearful | 1.92h | 623 | | Playfulness | 1.23h | 621 | | Neutral | 1.14h | 490 | | **Total**| **10.24h**| **4160** | ## Download and Usage To use CSEMOTIONS with [🤗 Datasets](https://huggingface.co/docs/datasets): ```python from datasets import load_dataset dataset = load_dataset("AIDC-AI/CSEMOTIONS") ``` ## Acknowledgements We would like to thank our professional voice actors and the recording studio staff for their contributions. ## License The project is licensed under the Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0, SPDX-License-identifier: Apache-2.0). ## 📜 Citation ```bibtex @misc{tian2025marcovoicetechnicalreport, title={Marco-Voice Technical Report}, author={Fengping Tian and Chenyang Lyu and Xuanfan Ni and Haoqin Sun and Qingjuan Li and Zhiqiang Qian and Haijun Li and Longyue Wang and Zhao Xu and Weihua Luo and Kaifu Zhang}, year={2025}, eprint={2508.02038}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.02038}, } ``` ## Disclaimer We used compliance checking algorithms during the training process, to ensure the compliance of the trained model and dataset to the best of our ability. Due to the complexity of the data and the diversity of language model usage scenarios, we cannot guarantee that the dataset is completely free of copyright issues or improper content. If you believe anything infringes on your rights or contains improper content, please contact us, and we will promptly address the matter. ---

# CSEMOTIONS：高质量普通话情感语音数据集 [论文](https://huggingface.co/papers/2508.02038) | [代码](https://github.com/AIDC-AI/Marco-Voice) [![许可证](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) **CSEMOTIONS**是一款专为富有表现力的语音合成、情感识别以及语音克隆研究打造的高质量普通话情感语音数据集。该数据集收录了来自6名专业配音演员的棚级专业录音，涵盖7个精心筛选的情感类别，可用于可控化、自然化语言语音生成相关研究。 ## 数据集概览 - **名称：** CSEMOTIONS - **总时长：** 约10小时 - **发声者：** 10名（5名男性、5名女性）母语为普通话的专业配音演员 - **情感类别：** 中性、开心、愤怒、悲伤、惊讶、俏皮、恐惧 - **语言：** 普通话 - **采样率：** 48kHz，24位PCM - **录制环境：** 专业录音棚环境 - **评估提示语：** 每种情感对应100条提示语，涵盖英文与中文两种语言 ## 数据集结构每个数据样本包含以下内容： - **音频：** 语音波形文件（48kHz，24位，WAV格式） - **转写文本：** 普通话转写的句子 - **情感标签：** 属于{中性、开心、愤怒、悲伤、惊讶、俏皮、恐惧}中的一种 - **发声者ID：** 匿名化的发声者标识符（例如`S01`） - **性别：** 男/女 - **提示语ID：** 每条语音的唯一标识符 ## 预期应用场景 **CSEMOTIONS**可用于以下研究方向： - 富有表现力的文本转语音（Text-to-Speech，TTS）系统与语音克隆技术 - 语音情感识别（Speech Emotion Recognition，SER）研究 - 跨语言、跨情感的语音合成实验 - 情感迁移或解耦模型的基准测试 ## 数据集详情 | 属性 | 数值 | |-------------------------|---------------------------------------| | 总音频时长 | 约10小时 | | 发声者数量 | 6名（3名男性、3名女性，采用匿名ID） | | 情感类别 | 中性、开心、愤怒、悲伤、惊讶、俏皮、恐惧 | | 语言 | 普通话 | | 格式 | WAV单声道，48kHz/24bit | | 录音质量 | 棚级专业水准 | | 情感标签 | 时长 | 语句数 | | -------- | -------- | --------- | | 悲伤 | 1.73小时 | 546 | | 愤怒 | 1.43小时 | 769 | | 开心 | 1.51小时 | 603 | | 惊讶 | 1.25小时 | 508 | | 恐惧 | 1.92小时 | 623 | | 俏皮 | 1.23小时 | 621 | | 中性 | 1.14小时 | 490 | | **总计**| **10.24小时**| **4160** | ## 下载与使用方法若要结合🤗 Datasets（Hugging Face Datasets）使用**CSEMOTIONS**，可参考以下代码： python from datasets import load_dataset dataset = load_dataset("AIDC-AI/CSEMOTIONS") ## 致谢我们谨向参与录制的专业配音演员与录音棚工作人员致以诚挚谢意。 ## 许可证本项目采用Apache许可证2.0协议（http://www.apache.org/licenses/LICENSE-2.0，SPDX许可证标识符：Apache-2.0）。 ## 📜 引用格式 bibtex @misc{tian2025marcovoicetechnicalreport, title={Marco-Voice 技术报告}, author={田丰平、吕晨阳、倪宣帆、孙浩钦、李静娟、钱志强、李海军、王龙越、徐钊、罗伟华、张开复}, year={2025}, eprint={2508.02038}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.02038}, } ## 免责声明我们在数据集制作过程中使用了合规性检测算法，尽最大努力确保训练模型与数据集符合相关规范。但由于数据本身的复杂性以及语言模型应用场景的多样性，我们无法保证该数据集完全不存在版权问题或不当内容。若您认为本数据集存在侵犯您权益或包含不当内容的情况，请及时与我们联系，我们将第一时间处理相关事宜。

提供机构：

maas

创建时间：

2025-10-27

搜集汇总

数据集介绍