WavCaps

Name: WavCaps
Creator: 萨里大学
Published: 2023-03-30 22:07:47
License: 暂无描述

arXiv2023-03-30 更新2024-06-21 收录

下载链接：

https://github.com/XinhaoMei/WavCaps

下载链接

链接失效反馈

官方服务：

资源简介：

WavCaps是首个大规模弱标签音频字幕数据集，由萨里大学等机构创建，包含约40万音频片段及其配对字幕。数据来源于网络资源和声音事件检测数据集，通过ChatGPT自动过滤和转换原始描述为高质量字幕。该数据集适用于多种音频语言多模态学习任务，旨在解决现有音频语言数据集规模有限的问题，推动音频语言多模态研究的发展。

WavCaps is the first large-scale weakly-labeled audio captioning dataset, created by institutions including the University of Surrey, containing approximately 400,000 audio clips and their paired captions. The data is sourced from web resources and sound event detection datasets, and original descriptions are automatically filtered and converted into high-quality captions via ChatGPT. This dataset is applicable to various audio-language multimodal learning tasks, aiming to address the limited scale issue of existing audio-language datasets and promote the development of audio-language multimodal research.

提供机构：

萨里大学

创建时间：

2023-03-30

搜集汇总

数据集介绍