ylacombe/expresso

Name: ylacombe/expresso
Creator: ylacombe
Published: 2024-04-30 16:49:14
License: 暂无描述

Hugging Face2024-04-30 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/ylacombe/expresso

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: config_name: read features: - name: audio dtype: audio - name: text dtype: string - name: speaker_id dtype: string - name: style dtype: string - name: id dtype: string splits: - name: train num_bytes: 5702432944.34 num_examples: 11615 download_size: 5761373569 dataset_size: 5702432944.34 configs: - config_name: read data_files: - split: train path: read/train-* license: cc-by-nc-4.0 language: - en pretty_name: The Expresso Dataset --- # The Expresso Dataset [[paper]](https://arxiv.org/abs/2308.05725) [[demo samples]](https://speechbot.github.io/expresso/) [[Original repository]](https://github.com/facebookresearch/textlesslib/tree/main/examples/expresso/dataset) ## Introduction The Expresso dataset is a high-quality (48kHz) expressive speech dataset that includes both expressively rendered read speech (8 styles, in mono wav format) and improvised dialogues (26 styles, in stereo wav format). The dataset includes 4 speakers (2 males, 2 females), and totals 40 hours (11h read, 30h improvised). The transcriptions of the read speech are also provided. You can listen to samples from the Expresso Dataset at [this website](https://speechbot.github.io/expresso/). ## Data Statistics Here are the statistics of Expresso’s expressive styles: ---------------------------------------------------------------- Style | Read (min) | Improvised (min) | total (hrs) ------------------|------------|------------------|------------- angry | - | 82 | 1.4 animal | - | 27 | 0.4 animal_directed | - | 32 | 0.5 awe | - | 92 | 1.5 bored | - | 92 | 1.5 calm | - | 93 | 1.6 child | - | 28 | 0.4 child_directed | - | 38 | 0.6 confused | 94 | 66 | 2.7 default | 133 | 158 | 4.9 desire | - | 92 | 1.5 disgusted | - | 118 | 2.0 enunciated | 116 | 62 | 3.0 fast | - | 98 | 1.6 fearful | - | 98 | 1.6 happy | 74 | 92 | 2.8 laughing | 94 | 103 | 3.3 narration | 21 | 76 | 1.6 non_verbal | - | 32 | 0.5 projected | - | 94 | 1.6 sad | 81 | 101 | 3.0 sarcastic | - | 106 | 1.8 singing* | - | 4 | .07 sleepy | - | 93 | 1.5 sympathetic | - | 100 | 1.7 whisper | 79 | 86 | 2.8 **Total** | **11.5h** | **34.4h** | **45.9h** ---------------------------------------------------------------- *singing is the only improvised style that is not in dialogue format. ## Audio Quality The audio was recorded in a professional recording studio with minimal background noise at 48kHz/24bit. The files for read speech and singing are in a mono wav format; and for the dialog section in stereo (one channel per actor), where the original flow of turn-taking is preserved. ### Read Speech The `read` config contains all the read speech and singing style. ## License The Expresso dataset is distributed under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. ## Reference For more information, see the paper: [EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis](https://arxiv.org/abs/2308.05725), Tu Anh Nguyen*, Wei-Ning Hsu*, Antony D'Avirro*, Bowen Shi*, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi⁺, Emmanuel Dupoux⁺, INTERSPEECH 2023.

提供机构：

ylacombe

原始信息汇总

数据集概述

数据集名称

名称: The Expresso Dataset
别名: read

数据集特征

音频: 类型为音频
文本: 类型为字符串
说话者ID: 类型为字符串
风格: 类型为字符串
ID: 类型为字符串

数据集划分

训练集: 包含11615个样本，数据大小为5702432944.34字节

数据集大小

下载大小: 5761373569字节
数据集大小: 5702432944.34字节

语言

语言: 英语

许可证

许可证: CC BY-NC 4.0

音频质量

录音环境: 专业录音室
采样率: 48kHz
位深度: 24bit
格式: 读语音和歌唱为单声道wav格式，对话部分为立体声（每演员一通道）

数据集内容

语音类型: 包含表达性读语音（8种风格）和即兴对话（26种风格）
说话者数量: 4位（2男2女）
总时长: 40小时（读语音11小时，即兴30小时）
读语音转录: 提供

数据统计

读语音总时长: 11.5小时
即兴对话总时长: 34.4小时
总计: 45.9小时

引用信息

论文: EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集