jspaulsen/esd
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jspaulsen/esd
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- audio-classification
- automatic-speech-recognition
language:
- zh
- en
tags:
- emotion
- speech
- voice
size_categories:
- 10K<n<100K
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: audio
dtype: audio
- name: transcript
dtype: string
- name: emotion
dtype: string
- name: speaker_id
dtype: string
- name: gender
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 3353221499.0
num_examples: 35000
download_size: 3145534453
dataset_size: 3353221499.0
---
# Emotional Speech Dataset (ESD)
The Emotional Speech Dataset (ESD) is a multilingual emotional speech corpus containing parallel recordings in English and Chinese across 5 emotions.
## Dataset Details
- **Total samples**: 35,000
- **Speakers**: 20 (10 Chinese, 10 English)
- **Emotions**: anger, happiness, neutral, sadness, surprise (7,000 each)
- **Languages**: Chinese (zh), English (en) - 17,500 each
- **Gender**: 10 male, 10 female speakers
## Dataset Structure
| Column | Description |
|--------|-------------|
| `audio` | Audio waveform (WAV) |
| `transcript` | Text transcription |
| `emotion` | anger, happiness, neutral, sadness, surprise |
| `speaker_id` | Speaker identifier (0001-0020) |
| `gender` | male / female |
| `language` | zh (Chinese) / en (English) |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("jspaulsen/esd")
```
## Citation
```bibtex
@inproceedings{zhou2021seen,
title={Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset},
author={Zhou, Kun and Sisman, Berrak and Liu, Rui and Li, Haizhou},
booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={920--924},
year={2021},
organization={IEEE}
}
@article{zhou2021emotional,
title={Emotional voice conversion: Theory, databases and ESD},
journal={Speech Communication},
volume={137},
pages={1-18},
year={2022},
issn={0167-6393}
}
```
许可证:cc-by-nc-4.0
任务类别:
- 音频分类
- 自动语音识别
语言:
- 中文(zh)
- 英语(en)
标签:
- 情感
- 语音
- 人声
样本规模:10K < 样本数 < 100K
配置项:
- 配置名称:default
数据文件:
- 拆分集:train(训练集)
路径:data/train-*
数据集信息:
特征项:
- 名称:audio,数据类型:音频
- 名称:transcript,数据类型:字符串
- 名称:emotion,数据类型:字符串
- 名称:speaker_id,数据类型:字符串
- 名称:gender,数据类型:字符串
- 名称:language,数据类型:字符串
拆分集信息:
- 拆分集名称:train,字节大小:3353221499.0,样本数量:35000
下载大小:3145534453,数据集总大小:3353221499.0
# 情感语音数据集(Emotional Speech Dataset,ESD)
情感语音数据集(ESD)是一款多语言情感语音语料库,包含英语与汉语的平行录制语料,涵盖5种情感类别。
## 数据集详情
- **总样本量**:35000
- **说话人规模**:20位,其中10位为中文母语者,10位为英语母语者
- **情感类别**:愤怒、喜悦、中性、悲伤、惊讶,每类各7000条样本
- **语言分布**:中文(zh)、英语(en),各17500条样本
- **性别分布**:10名男性说话人与10名女性说话人
## 数据集结构
| 列名 | 描述 |
|------|------|
| `audio` | 音频波形(WAV格式) |
| `transcript` | 文本转写内容 |
| `emotion` | 情感标签,可选值为愤怒、喜悦、中性、悲伤、惊讶 |
| `speaker_id` | 说话人标识符,取值范围为0001-0020 |
| `gender` | 性别,可选值为male(男)/ female(女) |
| `language` | 语言,可选值为zh(中文)/ en(英文) |
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("jspaulsen/esd")
## 引用格式
bibtex
@inproceedings{zhou2021seen,
title={Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset},
author={Zhou, Kun and Sisman, Berrak and Liu, Rui and Li, Haizhou},
booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={920--924},
year={2021},
organization={IEEE}
}
@article{zhou2021emotional,
title={Emotional voice conversion: Theory, databases and ESD},
journal={Speech Communication},
volume={137},
pages={1-18},
year={2022},
issn={0167-6393}
}
提供机构:
jspaulsen



