ylacombe/expresso
收藏Hugging Face2024-04-30 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/ylacombe/expresso
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: read
features:
- name: audio
dtype: audio
- name: text
dtype: string
- name: speaker_id
dtype: string
- name: style
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 5702432944.34
num_examples: 11615
download_size: 5761373569
dataset_size: 5702432944.34
configs:
- config_name: read
data_files:
- split: train
path: read/train-*
license: cc-by-nc-4.0
language:
- en
pretty_name: The Expresso Dataset
---
# The Expresso Dataset
[[paper]](https://arxiv.org/abs/2308.05725) [[demo samples]](https://speechbot.github.io/expresso/) [[Original repository]](https://github.com/facebookresearch/textlesslib/tree/main/examples/expresso/dataset)
## Introduction
The Expresso dataset is a high-quality (48kHz) expressive speech dataset that includes both expressively rendered read speech (8 styles, in mono wav format) and improvised dialogues (26 styles, in stereo wav format). The dataset includes 4 speakers (2 males, 2 females), and totals 40 hours (11h read, 30h improvised). The transcriptions of the read speech are also provided.
You can listen to samples from the Expresso Dataset at [this website](https://speechbot.github.io/expresso/).
## Data Statistics
Here are the statistics of Expresso’s expressive styles:
----------------------------------------------------------------
Style | Read (min) | Improvised (min) | total (hrs)
------------------|------------|------------------|-------------
angry | - | 82 | 1.4
animal | - | 27 | 0.4
animal_directed | - | 32 | 0.5
awe | - | 92 | 1.5
bored | - | 92 | 1.5
calm | - | 93 | 1.6
child | - | 28 | 0.4
child_directed | - | 38 | 0.6
confused | 94 | 66 | 2.7
default | 133 | 158 | 4.9
desire | - | 92 | 1.5
disgusted | - | 118 | 2.0
enunciated | 116 | 62 | 3.0
fast | - | 98 | 1.6
fearful | - | 98 | 1.6
happy | 74 | 92 | 2.8
laughing | 94 | 103 | 3.3
narration | 21 | 76 | 1.6
non_verbal | - | 32 | 0.5
projected | - | 94 | 1.6
sad | 81 | 101 | 3.0
sarcastic | - | 106 | 1.8
singing* | - | 4 | .07
sleepy | - | 93 | 1.5
sympathetic | - | 100 | 1.7
whisper | 79 | 86 | 2.8
**Total** | **11.5h** | **34.4h** | **45.9h**
----------------------------------------------------------------
*singing is the only improvised style that is not in dialogue format.
## Audio Quality
The audio was recorded in a professional recording studio with minimal background noise at 48kHz/24bit. The files for read speech and singing are in a mono wav format; and for the dialog section in stereo (one channel per actor), where the original flow of turn-taking is preserved.
### Read Speech
The `read` config contains all the read speech and singing style.
## License
The Expresso dataset is distributed under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
## Reference
For more information, see the paper: [EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis](https://arxiv.org/abs/2308.05725), Tu Anh Nguyen*, Wei-Ning Hsu*, Antony D'Avirro*, Bowen Shi*, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi⁺, Emmanuel Dupoux⁺, INTERSPEECH 2023.
提供机构:
ylacombe
原始信息汇总
数据集概述
数据集名称
- 名称: The Expresso Dataset
- 别名: read
数据集特征
- 音频: 类型为音频
- 文本: 类型为字符串
- 说话者ID: 类型为字符串
- 风格: 类型为字符串
- ID: 类型为字符串
数据集划分
- 训练集: 包含11615个样本,数据大小为5702432944.34字节
数据集大小
- 下载大小: 5761373569字节
- 数据集大小: 5702432944.34字节
语言
- 语言: 英语
许可证
- 许可证: CC BY-NC 4.0
音频质量
- 录音环境: 专业录音室
- 采样率: 48kHz
- 位深度: 24bit
- 格式: 读语音和歌唱为单声道wav格式,对话部分为立体声(每演员一通道)
数据集内容
- 语音类型: 包含表达性读语音(8种风格)和即兴对话(26种风格)
- 说话者数量: 4位(2男2女)
- 总时长: 40小时(读语音11小时,即兴30小时)
- 读语音转录: 提供
数据统计
- 读语音总时长: 11.5小时
- 即兴对话总时长: 34.4小时
- 总计: 45.9小时
引用信息
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



