voices-with-captions

Name: voices-with-captions
Creator: maas
Published: 2025-12-05 16:54:46
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-15 收录

下载链接：

https://modelscope.cn/datasets/laion/voices-with-captions

下载链接

链接失效反馈

官方服务：

资源简介：

# 🗣️ Synthetic Voice Description Dataset for Voice Conversion ## Overview This dataset contains **3.180 synthetically generated voice samples**, each paired with a **caption** that roughly describes how the voice sounds. Captions typically include information about the **age**, **gender**, **accent**, and sometimes **general vocal qualities** (e.g., "old woman, Irish accent"). All voices are **synthetically generated** and do not represent any real person. This makes the dataset ideal for training and evaluating **voice conversion models** without infringing on personhood or identity rights. --- ## 📁 Dataset Contents - A **CSV file** containing pairs of: - `description`: A short text caption describing the voice - `matched_filename`: The name of the corresponding `.wav` file **Example rows from the CSV:** ```csv description,matched_filename "old woman, Irish accent",00399-000.wav "old man, valley girl accent",01074-001.wav "young man, Scottish accent",18297-001.wav "middle-aged man, Chinese accent",00952-002.wav ``` - A **TAR archive** that contains: - The `voice-captions.csv` file (as described above) - All 3.180 corresponding `.wav` audio files --- ## 🎯 Purpose The dataset is designed to serve as a **target files** for voice conversion tasks. It can be used to: - Convert existing voice data into synthetic voices with known, described properties --- ## ⚙️ Google Colab for Seed-VC Fine-Tuning A community-developed fine-tuning notebook for **Seed-VC** (a voice conversion model) is available on Google Colab: 🔗 **[Seed-VC Fine-Tune Notebook](https://colab.research.google.com/drive/1HeJgMIRpEMd87z5oAcfBfS8_YRLvrwr9?usp=sharing)** This notebook allows you to take real or synthetic voice data and convert it into one of the target voices.

# 🗣️ 语音转换（Voice Conversion）用合成语音描述数据集 ## 概述本数据集包含**3180个合成生成的语音样本**，每个样本均搭配一段**语音描述标注**，用于大致描述该语音的声学特征。标注内容通常涵盖**年龄**、**性别**、**口音**，有时还会包含**通用嗓音特质**（例如“老年女性，爱尔兰口音”）。所有语音均为合成生成，不代表任何真实自然人，因此本数据集可安全用于训练与评估**语音转换模型**，不会侵犯他人人格权与身份权益。 --- ## 📁 数据集内容 - 一个**CSV文件**，包含以下成对字段： - `description`：描述该语音的简短文本标注 - `matched_filename`：对应`.wav`音频文件的文件名 **CSV文件示例行：** csv description,matched_filename "老年女性，爱尔兰口音",00399-000.wav "老年男性，谷区少女口音",01074-001.wav "年轻男性，苏格兰口音",18297-001.wav "中年男性，中国口音",00952-002.wav - 一个**TAR归档文件**，内含： - 上述的`voice-captions.csv`文件 - 全部3180个对应`.wav`音频文件 --- ## 🎯 用途本数据集旨在作为**语音转换任务的目标语音文件集**，可用于： - 将现有语音数据转换为具备已知可描述属性的合成语音 --- ## ⚙️ 用于Seed-VC微调的Google Colab工具一款由社区开发的**Seed-VC（语音转换模型）微调笔记本**已发布于Google Colab： 🔗 **[Seed-VC 微调笔记本](https://colab.research.google.com/drive/1HeJgMIRpEMd87z5oAcfBfS8_YRLvrwr9?usp=sharing)** 该笔记本支持将真实或合成语音数据转换为上述目标语音之一。

提供机构：

maas

创建时间：

2025-10-14

搜集汇总

数据集介绍