five

JHU-SmileLab/NaturalVoices_EVC

收藏
Hugging Face2025-11-11 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/JHU-SmileLab/NaturalVoices_EVC
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - audio-to-audio - text-to-speech - audio-classification language: - en --- # NaturalVoices EVC A large emotional voice conversion (EVC) dataset curated from spontaneous, in-the-wild podcast speech as part of the **NaturalVoices** project in collaboration with 🤗[MSP Lab at CMU LTI](https://huggingface.co/Lab-MSP). This release provides the emotion balanced subset of the NaturalVoices **870-hour** VC dataset and intended for training and evaluating emotion-aware voice conversion systems but not limited to VC tasks. - 📄 Paper: *NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion* — https://arxiv.org/abs/2511.00256 \ - 🧺 Dataset collection (related subsets, e.g., 10% of data & emotional VC): https://huggingface.co/collections/JHU-SmileLab/naturalvoices-voice-conversion-datasets \ - <span style="display:inline-flex;align-items:center;gap:-6px"> <img src="https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white" height=20 alt="GitHub badge"> <span>The extensive (unfiltered) NaturalVoices dataset and the code for the data collection & curation pipeline: <a href="https://github.com/Lab-MSP/NaturalVoices">https://github.com/Lab-MSP/NaturalVoices</a></span> </span> ## Dataset Summary NaturalVoices VC compiles real-life, expressive podcast speech and provides automatic **annotations** designed for VC research (e.g., **emotion** attributes, **speaker identity**, **speech quality**, **transcripts**). The broader NaturalVoices corpus contains thousands of hours of podcast speech; this repository hosts the **EVC** subset. **What’s in this repo** - ~370 hours of podcast speech tailored and preprocessed for EVC. - Balanced distribution of categorical emotions (Angry, Happy, Neutral, Sad) - A wide range of speakers both manually & automatically annotated. - Annotations archive with per-utterance annotations including: - Emotion categorical labels & dimensional attributes (valence/arousal/dominance), - Speech quality indicators, - Text, Gender, and Duration. ### Subsets | Subset | Description | Link | | --------------------------- | :------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- | | NaturalVoices_VC_870h | 870h of speech data curated for VC | 🤗[JHU-SmileLab/NaturalVoices_VC_870h](https://JHU-SmileLab/NaturalVoices_VC_870h) | | NaturalVoices_EVC | Emotion-balanced subset for Emotional Voice Conversion (EVC) | This repo | | NaturalVoices_VC_01 (10%) | A smaller subset uniformly sampled from 870h (10%) | 🤗[JHU-SmileLab/NaturalVoices_VC_0.1](https://huggingface.co/datasets/JHU-SmileLab/NaturalVoices_VC_0.1) | ## How to use You can directly download the dataset using the following command: ```bash huggingface-cli download JHU-SmileLab/NaturalVoices_EVC --repo-type=dataset --local-dir=YOUR_LOCAL_DIR ``` *Streaming support will be available* ## Cite & Contribute If you use this dataset, please cite the paper: ```sql @misc{du2025naturalvoiceslargescalespontaneousemotional, title={NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion}, author={Zongyang Du and Shreeram Suresh Chandra and Ismail Rasim Ulgen and Aurosweta Mahapatra and Ali N. Salman and Carlos Busso and Berrak Sisman}, year={2025}, eprint={2511.00256}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2511.00256}, } ```
提供机构:
JHU-SmileLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作