laions_got_talent
收藏魔搭社区2025-12-04 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/laion/laions_got_talent
下载链接
链接失效反馈官方服务:
资源简介:
# LAION's Got Talent: Generated Voice Acting Dataset
## Overview
"LAION's Got Talent" is a generated dataset comprising voice acting samples that exhibit a wide range of emotions, vocal bursts, topics, and content. This dataset is a component of the BUD-E project, spearheaded by LAION with support from Intel.
## Dataset Composition
The dataset includes:
- **Emotional Diversity:** Samples portraying various emotions to facilitate research in emotional recognition and synthesis.
- **Vocal Bursts:** Instances of non-verbal vocal expressions, such as laughter, sighs, and gasps.
- **Topical Variety:** Content covering multiple subjects to support diverse applications.
( currently 110 hours, will grow soon )
## Purpose
This dataset aims to advance the development of empathetic and context-aware AI voice assistants. By providing a rich array of vocal expressions, it serves as a valuable resource for training models that can understand and generate natural, emotionally nuanced speech.
## BUD-E Project
BUD-E (Buddy for Understanding and Digital Empathy) is an open-source AI voice assistant project focused on enhancing conversational quality, naturalness, and empathy.
Detailed documentation and analysis of the dataset will be provided in subsequent publications. Researchers and developers are encouraged to utilize this dataset to further the capabilities of AI voice assistants and related technologies.
## Construction
The dataset was constructed wiht a diverse menu of prompts the OpenAI Voice API via Hyprlab (https://docs.hyprlab.io/browse-models/model-list/openai/chat#gpt-4o-audio-models).
## Acknowledgments
This dataset was developed as part of the BUD-E project, led by LAION with support from Intel. We extend our gratitude to all contributors and collaborators involved in this initiative.
# LAION's Got Talent:生成式配音数据集
## 概述
“LAION达人秀”是一款生成式数据集,内含覆盖丰富情感、非语言发声、多样主题与多元内容的配音样本。本数据集是LAION主导、英特尔(Intel)支持的BUD-E项目的组成部分。
## 数据集构成
- **情感多样性**:包含各类情感表达的样本,可用于情感识别与合成相关研究。
- **非语言发声片段**:涵盖笑声、叹息、喘息等非语言发声实例。
- **主题多样性**:覆盖多领域内容,可支撑多样化应用场景。
(当前规模为110小时,后续将持续扩充)
## 项目目标
本数据集旨在推动具备共情能力与上下文感知能力的AI语音助手技术发展。通过提供丰富多元的语音表达资源,本数据集可作为训练模型的宝贵支撑数据,助力模型理解并生成自然且富有情感层次的语音内容。
## BUD-E项目
BUD-E(全称Buddy for Understanding and Digital Empathy,即“理解与数字共情伙伴”)是一款开源AI语音助手项目,致力于提升对话质量、语音自然度与共情能力。
本数据集的详细文档与分析将在后续发表的学术出版物中公布。欢迎广大研究者与开发者使用本数据集,进一步提升AI语音助手及相关技术的能力水平。
## 数据集构建
本数据集通过Hyprlab平台调用OpenAI语音API(OpenAI Voice API),基于多样化提示词生成构建完成。数据集构建所使用的Hyprlab平台官方文档链接为:https://docs.hyprlab.io/browse-models/model-list/openai/chat#gpt-4o-audio-models。
## 致谢
本数据集作为LAION主导、英特尔(Intel)支持的BUD-E项目的一部分开发完成。我们向所有参与本项目的贡献者与合作者致以诚挚谢意。
提供机构:
maas
创建时间:
2025-10-04



