five

jspaulsen/esd

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jspaulsen/esd
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - audio-classification - automatic-speech-recognition language: - zh - en tags: - emotion - speech - voice size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: audio dtype: audio - name: transcript dtype: string - name: emotion dtype: string - name: speaker_id dtype: string - name: gender dtype: string - name: language dtype: string splits: - name: train num_bytes: 3353221499.0 num_examples: 35000 download_size: 3145534453 dataset_size: 3353221499.0 --- # Emotional Speech Dataset (ESD) The Emotional Speech Dataset (ESD) is a multilingual emotional speech corpus containing parallel recordings in English and Chinese across 5 emotions. ## Dataset Details - **Total samples**: 35,000 - **Speakers**: 20 (10 Chinese, 10 English) - **Emotions**: anger, happiness, neutral, sadness, surprise (7,000 each) - **Languages**: Chinese (zh), English (en) - 17,500 each - **Gender**: 10 male, 10 female speakers ## Dataset Structure | Column | Description | |--------|-------------| | `audio` | Audio waveform (WAV) | | `transcript` | Text transcription | | `emotion` | anger, happiness, neutral, sadness, surprise | | `speaker_id` | Speaker identifier (0001-0020) | | `gender` | male / female | | `language` | zh (Chinese) / en (English) | ## Usage ```python from datasets import load_dataset dataset = load_dataset("jspaulsen/esd") ``` ## Citation ```bibtex @inproceedings{zhou2021seen, title={Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset}, author={Zhou, Kun and Sisman, Berrak and Liu, Rui and Li, Haizhou}, booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={920--924}, year={2021}, organization={IEEE} } @article{zhou2021emotional, title={Emotional voice conversion: Theory, databases and ESD}, journal={Speech Communication}, volume={137}, pages={1-18}, year={2022}, issn={0167-6393} } ```

许可证:cc-by-nc-4.0 任务类别: - 音频分类 - 自动语音识别 语言: - 中文(zh) - 英语(en) 标签: - 情感 - 语音 - 人声 样本规模:10K < 样本数 < 100K 配置项: - 配置名称:default 数据文件: - 拆分集:train(训练集) 路径:data/train-* 数据集信息: 特征项: - 名称:audio,数据类型:音频 - 名称:transcript,数据类型:字符串 - 名称:emotion,数据类型:字符串 - 名称:speaker_id,数据类型:字符串 - 名称:gender,数据类型:字符串 - 名称:language,数据类型:字符串 拆分集信息: - 拆分集名称:train,字节大小:3353221499.0,样本数量:35000 下载大小:3145534453,数据集总大小:3353221499.0 # 情感语音数据集(Emotional Speech Dataset,ESD) 情感语音数据集(ESD)是一款多语言情感语音语料库,包含英语与汉语的平行录制语料,涵盖5种情感类别。 ## 数据集详情 - **总样本量**:35000 - **说话人规模**:20位,其中10位为中文母语者,10位为英语母语者 - **情感类别**:愤怒、喜悦、中性、悲伤、惊讶,每类各7000条样本 - **语言分布**:中文(zh)、英语(en),各17500条样本 - **性别分布**:10名男性说话人与10名女性说话人 ## 数据集结构 | 列名 | 描述 | |------|------| | `audio` | 音频波形(WAV格式) | | `transcript` | 文本转写内容 | | `emotion` | 情感标签,可选值为愤怒、喜悦、中性、悲伤、惊讶 | | `speaker_id` | 说话人标识符,取值范围为0001-0020 | | `gender` | 性别,可选值为male(男)/ female(女) | | `language` | 语言,可选值为zh(中文)/ en(英文) | ## 使用方法 python from datasets import load_dataset dataset = load_dataset("jspaulsen/esd") ## 引用格式 bibtex @inproceedings{zhou2021seen, title={Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset}, author={Zhou, Kun and Sisman, Berrak and Liu, Rui and Li, Haizhou}, booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={920--924}, year={2021}, organization={IEEE} } @article{zhou2021emotional, title={Emotional voice conversion: Theory, databases and ESD}, journal={Speech Communication}, volume={137}, pages={1-18}, year={2022}, issn={0167-6393} }
提供机构:
jspaulsen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作