JHU-SmileLab/NaturalVoices_VC_0.1

Name: JHU-SmileLab/NaturalVoices_VC_0.1
Creator: JHU-SmileLab
Published: 2025-11-11 07:14:19
License: 暂无描述

Hugging Face2025-11-11 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/JHU-SmileLab/NaturalVoices_VC_0.1

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - audio-to-audio - text-to-speech - audio-classification - automatic-speech-recognition language: - en --- NaturalVoices VC 10% A large voice conversion (VC) dataset curated from spontaneous, in-the-wild podcast speech as part of the **NaturalVoices** project in collaboration with 🤗[MSP Lab at CMU LTI](https://huggingface.co/Lab-MSP). This release provides the 10% subset uniformly sampled from **870-hour** VC dataset and subsets mainly intended for training and evaluating emotion-aware voice conversion systems but not limited to VC tasks. - 📄 Paper: *NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion* — https://arxiv.org/abs/2511.00256 \ - 🧺 Dataset collection (related subsets, e.g., 10% of data & emotional VC): https://huggingface.co/collections/JHU-SmileLab/naturalvoices-voice-conversion-datasets \ - <span style="display:inline-flex;align-items:center;gap:-6px"> <img src="https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white" height=20 alt="GitHub badge"> <span>The extensive (unfiltered) NaturalVoices dataset and the code for the data collection & curation pipeline: <a href="https://github.com/Lab-MSP/NaturalVoices">https://github.com/Lab-MSP/NaturalVoices</a></span> </span> ## Dataset Summary NaturalVoices VC compiles real-life, expressive podcast speech and provides automatic **annotations** designed for VC research (e.g., **emotion** attributes, **speaker identity**, **speech quality**, **transcripts**). The broader NaturalVoices corpus contains thousands of hours of podcast speech; this repository hosts the **VC_01** subset. **What’s in this repo** - ~90 hours of podcast speech tailored and preprocessed for VC. - A wide range of speakers, both manually & automatically annotated. - Annotations archive with per-utterance annotations including: - Emotion categorical labels & dimensional attributes (valence/arousal/dominance), - Speech quality indicators, - Text, Gender, and Duration. ### Subsets | Subset | Description | Link | | --------------------------- | :------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | | NaturalVoices_VC_870h | 870h of speech data curated for VC | 🤗[JHU-SmileLab/NaturalVoices_VC_870h](https://JHU-SmileLab/NaturalVoices_VC_870h) | | NaturalVoices_EVC | Emotion-balanced subset for Emotional Voice Conversion (EVC) | 🤗[JHU-SmileLab/NaturalVoices_EVC](https://huggingface.co/datasets/JHU-SmileLab/NaturalVoices_EVC) | | NaturalVoices_VC_01 (10%) | A smaller subset uniformly sampled from 870h (10%) | This repo | ## How to use You can directly download the dataset using the following command: ```bash huggingface-cli download JHU-SmileLab/NaturalVoices_VC_0.1 --repo-type=dataset --local-dir=YOUR_LOCAL_DIR ``` *Streaming support will be available* ## Cite & Contribute If you use this dataset, please cite the paper: ```sql @misc{du2025naturalvoiceslargescalespontaneousemotional, title={NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion}, author={Zongyang Du and Shreeram Suresh Chandra and Ismail Rasim Ulgen and Aurosweta Mahapatra and Ali N. Salman and Carlos Busso and Berrak Sisman}, year={2025}, eprint={2511.00256}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2511.00256}, } ```

提供机构：

JHU-SmileLab

5,000+

优质数据集

54 个

任务类型

进入经典数据集