JHU-SmileLab/NaturalVoices_VC_0.1
收藏Hugging Face2025-11-11 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/JHU-SmileLab/NaturalVoices_VC_0.1
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- audio-to-audio
- text-to-speech
- audio-classification
- automatic-speech-recognition
language:
- en
---
NaturalVoices VC 10%
A large voice conversion (VC) dataset curated from spontaneous, in-the-wild podcast speech as part of the **NaturalVoices** project in collaboration with 🤗[MSP Lab at CMU LTI](https://huggingface.co/Lab-MSP). This release provides the 10% subset uniformly sampled from **870-hour** VC dataset and subsets mainly intended for training and evaluating emotion-aware voice conversion systems but not limited to VC tasks.
- 📄 Paper: *NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion* — https://arxiv.org/abs/2511.00256 \
- 🧺 Dataset collection (related subsets, e.g., 10% of data & emotional VC): https://huggingface.co/collections/JHU-SmileLab/naturalvoices-voice-conversion-datasets \
- <span style="display:inline-flex;align-items:center;gap:-6px">
<img src="https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white" height=20 alt="GitHub badge">
<span>The extensive (unfiltered) NaturalVoices dataset and the code for the data collection & curation pipeline: <a href="https://github.com/Lab-MSP/NaturalVoices">https://github.com/Lab-MSP/NaturalVoices</a></span>
</span>
## Dataset Summary
NaturalVoices VC compiles real-life, expressive podcast speech and provides automatic **annotations** designed for VC research (e.g., **emotion** attributes, **speaker identity**, **speech quality**, **transcripts**). The broader NaturalVoices corpus contains thousands of hours of podcast speech; this repository hosts the **VC_01** subset.
**What’s in this repo**
- ~90 hours of podcast speech tailored and preprocessed for VC.
- A wide range of speakers, both manually & automatically annotated.
- Annotations archive with per-utterance annotations including:
- Emotion categorical labels & dimensional attributes (valence/arousal/dominance),
- Speech quality indicators,
- Text, Gender, and Duration.
### Subsets
| Subset | Description | Link |
| --------------------------- | :------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| NaturalVoices_VC_870h | 870h of speech data curated for VC | 🤗[JHU-SmileLab/NaturalVoices_VC_870h](https://JHU-SmileLab/NaturalVoices_VC_870h) |
| NaturalVoices_EVC | Emotion-balanced subset for Emotional Voice Conversion (EVC) | 🤗[JHU-SmileLab/NaturalVoices_EVC](https://huggingface.co/datasets/JHU-SmileLab/NaturalVoices_EVC) |
| NaturalVoices_VC_01 (10%) | A smaller subset uniformly sampled from 870h (10%) | This repo |
## How to use
You can directly download the dataset using the following command:
```bash
huggingface-cli download JHU-SmileLab/NaturalVoices_VC_0.1 --repo-type=dataset --local-dir=YOUR_LOCAL_DIR
```
*Streaming support will be available*
## Cite & Contribute
If you use this dataset, please cite the paper:
```sql
@misc{du2025naturalvoiceslargescalespontaneousemotional,
title={NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion},
author={Zongyang Du and Shreeram Suresh Chandra and Ismail Rasim Ulgen and Aurosweta Mahapatra and Ali N. Salman and Carlos Busso and Berrak Sisman},
year={2025},
eprint={2511.00256},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2511.00256},
}
```
提供机构:
JHU-SmileLab



