five

ylacombe/expresso

收藏
Hugging Face2024-04-30 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/ylacombe/expresso
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: config_name: read features: - name: audio dtype: audio - name: text dtype: string - name: speaker_id dtype: string - name: style dtype: string - name: id dtype: string splits: - name: train num_bytes: 5702432944.34 num_examples: 11615 download_size: 5761373569 dataset_size: 5702432944.34 configs: - config_name: read data_files: - split: train path: read/train-* license: cc-by-nc-4.0 language: - en pretty_name: The Expresso Dataset --- # The Expresso Dataset [[paper]](https://arxiv.org/abs/2308.05725) [[demo samples]](https://speechbot.github.io/expresso/) [[Original repository]](https://github.com/facebookresearch/textlesslib/tree/main/examples/expresso/dataset) ## Introduction The Expresso dataset is a high-quality (48kHz) expressive speech dataset that includes both expressively rendered read speech (8 styles, in mono wav format) and improvised dialogues (26 styles, in stereo wav format). The dataset includes 4 speakers (2 males, 2 females), and totals 40 hours (11h read, 30h improvised). The transcriptions of the read speech are also provided. You can listen to samples from the Expresso Dataset at [this website](https://speechbot.github.io/expresso/). ## Data Statistics Here are the statistics of Expresso’s expressive styles: ---------------------------------------------------------------- Style | Read (min) | Improvised (min) | total (hrs) ------------------|------------|------------------|------------- angry | - | 82 | 1.4 animal | - | 27 | 0.4 animal_directed | - | 32 | 0.5 awe | - | 92 | 1.5 bored | - | 92 | 1.5 calm | - | 93 | 1.6 child | - | 28 | 0.4 child_directed | - | 38 | 0.6 confused | 94 | 66 | 2.7 default | 133 | 158 | 4.9 desire | - | 92 | 1.5 disgusted | - | 118 | 2.0 enunciated | 116 | 62 | 3.0 fast | - | 98 | 1.6 fearful | - | 98 | 1.6 happy | 74 | 92 | 2.8 laughing | 94 | 103 | 3.3 narration | 21 | 76 | 1.6 non_verbal | - | 32 | 0.5 projected | - | 94 | 1.6 sad | 81 | 101 | 3.0 sarcastic | - | 106 | 1.8 singing* | - | 4 | .07 sleepy | - | 93 | 1.5 sympathetic | - | 100 | 1.7 whisper | 79 | 86 | 2.8 **Total** | **11.5h** | **34.4h** | **45.9h** ---------------------------------------------------------------- *singing is the only improvised style that is not in dialogue format. ## Audio Quality The audio was recorded in a professional recording studio with minimal background noise at 48kHz/24bit. The files for read speech and singing are in a mono wav format; and for the dialog section in stereo (one channel per actor), where the original flow of turn-taking is preserved. ### Read Speech The `read` config contains all the read speech and singing style. ## License The Expresso dataset is distributed under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. ## Reference For more information, see the paper: [EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis](https://arxiv.org/abs/2308.05725), Tu Anh Nguyen*, Wei-Ning Hsu*, Antony D'Avirro*, Bowen Shi*, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi⁺, Emmanuel Dupoux⁺, INTERSPEECH 2023.
提供机构:
ylacombe
原始信息汇总

数据集概述

数据集名称

  • 名称: The Expresso Dataset
  • 别名: read

数据集特征

  • 音频: 类型为音频
  • 文本: 类型为字符串
  • 说话者ID: 类型为字符串
  • 风格: 类型为字符串
  • ID: 类型为字符串

数据集划分

  • 训练集: 包含11615个样本,数据大小为5702432944.34字节

数据集大小

  • 下载大小: 5761373569字节
  • 数据集大小: 5702432944.34字节

语言

  • 语言: 英语

许可证

  • 许可证: CC BY-NC 4.0

音频质量

  • 录音环境: 专业录音室
  • 采样率: 48kHz
  • 位深度: 24bit
  • 格式: 读语音和歌唱为单声道wav格式,对话部分为立体声(每演员一通道)

数据集内容

  • 语音类型: 包含表达性读语音(8种风格)和即兴对话(26种风格)
  • 说话者数量: 4位(2男2女)
  • 总时长: 40小时(读语音11小时,即兴30小时)
  • 读语音转录: 提供

数据统计

  • 读语音总时长: 11.5小时
  • 即兴对话总时长: 34.4小时
  • 总计: 45.9小时

引用信息

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作