voices-with-captions
收藏魔搭社区2025-12-05 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/laion/voices-with-captions
下载链接
链接失效反馈官方服务:
资源简介:
# 🗣️ Synthetic Voice Description Dataset for Voice Conversion
## Overview
This dataset contains **3.180 synthetically generated voice samples**, each paired with a **caption** that roughly describes how the voice sounds. Captions typically include information about the **age**, **gender**, **accent**, and sometimes **general vocal qualities** (e.g., "old woman, Irish accent").
All voices are **synthetically generated** and do not represent any real person. This makes the dataset ideal for training and evaluating **voice conversion models** without infringing on personhood or identity rights.
---
## 📁 Dataset Contents
- A **CSV file** containing pairs of:
- `description`: A short text caption describing the voice
- `matched_filename`: The name of the corresponding `.wav` file
**Example rows from the CSV:**
```csv
description,matched_filename
"old woman, Irish accent",00399-000.wav
"old man, valley girl accent",01074-001.wav
"young man, Scottish accent",18297-001.wav
"middle-aged man, Chinese accent",00952-002.wav
```
- A **TAR archive** that contains:
- The `voice-captions.csv` file (as described above)
- All 3.180 corresponding `.wav` audio files
---
## 🎯 Purpose
The dataset is designed to serve as a **target files** for voice conversion tasks. It can be used to:
- Convert existing voice data into synthetic voices with known, described properties
---
## ⚙️ Google Colab for Seed-VC Fine-Tuning
A community-developed fine-tuning notebook for **Seed-VC** (a voice conversion model) is available on Google Colab:
🔗 **[Seed-VC Fine-Tune Notebook](https://colab.research.google.com/drive/1HeJgMIRpEMd87z5oAcfBfS8_YRLvrwr9?usp=sharing)**
This notebook allows you to take real or synthetic voice data and convert it into one of the target voices.
# 🗣️ 语音转换(Voice Conversion)用合成语音描述数据集
## 概述
本数据集包含**3180个合成生成的语音样本**,每个样本均搭配一段**语音描述标注**,用于大致描述该语音的声学特征。标注内容通常涵盖**年龄**、**性别**、**口音**,有时还会包含**通用嗓音特质**(例如“老年女性,爱尔兰口音”)。
所有语音均为合成生成,不代表任何真实自然人,因此本数据集可安全用于训练与评估**语音转换模型**,不会侵犯他人人格权与身份权益。
---
## 📁 数据集内容
- 一个**CSV文件**,包含以下成对字段:
- `description`:描述该语音的简短文本标注
- `matched_filename`:对应`.wav`音频文件的文件名
**CSV文件示例行:**
csv
description,matched_filename
"老年女性,爱尔兰口音",00399-000.wav
"老年男性,谷区少女口音",01074-001.wav
"年轻男性,苏格兰口音",18297-001.wav
"中年男性,中国口音",00952-002.wav
- 一个**TAR归档文件**,内含:
- 上述的`voice-captions.csv`文件
- 全部3180个对应`.wav`音频文件
---
## 🎯 用途
本数据集旨在作为**语音转换任务的目标语音文件集**,可用于:
- 将现有语音数据转换为具备已知可描述属性的合成语音
---
## ⚙️ 用于Seed-VC微调的Google Colab工具
一款由社区开发的**Seed-VC(语音转换模型)微调笔记本**已发布于Google Colab:
🔗 **[Seed-VC 微调笔记本](https://colab.research.google.com/drive/1HeJgMIRpEMd87z5oAcfBfS8_YRLvrwr9?usp=sharing)**
该笔记本支持将真实或合成语音数据转换为上述目标语音之一。
提供机构:
maas
创建时间:
2025-10-14



