cvssp/WavCaps

Name: cvssp/WavCaps
Creator: cvssp
Published: 2023-07-06 13:28:10
License: 暂无描述

Hugging Face2023-07-06 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/cvssp/WavCaps

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - en size_categories: - 100B<n<1T --- # WavCaps WavCaps is a ChatGPT-assisted weakly-labelled audio captioning dataset for audio-language multimodal research, where the audio clips are sourced from three websites ([FreeSound](https://freesound.org/), [BBC Sound Effects](https://sound-effects.bbcrewind.co.uk/), and [SoundBible](https://soundbible.com/)) and a sound event detection dataset ([AudioSet Strongly-labelled Subset](https://research.google.com/audioset/download_strong.html)). - **Paper:** https://arxiv.org/abs/2303.17395 - **Github:** https://github.com/XinhaoMei/WavCaps ## Statistics | Data Source | # audio | avg. audio duration (s) | avg. text length | |--------------------|----------|-------------------------|------------------| | FreeSound | 262300 | 85.98 | 6.77 | | BBC Sound Effects | 31201 | 115.04 | 9.67 | | SoundBible | 1232 | 13.12 | 5.87 | | AudioSet SL subset | 108317 | 10.00 | 9.79 | | WavCaps | 403050 | 67.59 | 7.80 | ## Download We provide a json file for each data source. For audio clips sourced from websites, we provide processed caption, raw description, as well as other metadata. For audio clips from AudioSet, we use the version from PANNs, where each file name is appended with a 'Y' at the start. For the start time, please refer to the original metadata of AudioSet SL subset. Waveforms with flac format can be downloaded through [Zip_files](https://huggingface.co/datasets/cvssp/WavCaps/tree/main/Zip_files) directory. Pretrained models can be downloaded [here](https://drive.google.com/drive/folders/1pFr8IRY3E1FAtc2zjYmeuSVY3M5a-Kdj?usp=share_link). <font color='red'>If you get "error: invalid zip file with overlapped components (possible zip bomb)" when unzipping, please try the following commands: </font> `zip -F AudioSet_SL.zip --out AS.zip` `unzip AS.zip` ## License Only academic uses are allowed for WavCaps dataset. By downloading audio clips through the links provided in the json files, you agree that you will use the audios for research purposes only. For credits for audio clips from FreeSound, please refer to its own page. For detailed license information, please refer to: [FreeSound](https://freesound.org/help/faq/#licenses), [BBC Sound Effects](https://sound-effects.bbcrewind.co.uk/licensing), [SoundBible](https://soundbible.com/about.php) The models we provided are created under a UK data copyright exemption for non-commercial research. ## Code for related tasks We provide codes and pre-trained models for audio-language retrieval, automated audio captioning, and zero-shot audio classification. * [Retrieval](https://github.com/XinhaoMei/WavCaps/tree/master/retrieval) * [Captioning](https://github.com/XinhaoMei/WavCaps/tree/master/captioning) * [Zero-shot Audio Classification](https://github.com/XinhaoMei/WavCaps/blob/master/retrieval/zero_shot_classification.py) * [Text-to-Sound Generation](https://github.com/haoheliu/AudioLDM) ## Citation Please cite the following if you make use of the dataset. ```bibtex @article{mei2023wavcaps, title={WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research}, author={Mei, Xinhao and Meng, Chutong and Liu, Haohe and Kong, Qiuqiang and Ko, Tom and Zhao, Chengqi and Plumbley, Mark D and Zou, Yuexian and Wang, Wenwu}, journal={arXiv preprint arXiv:2303.17395}, year={2023} } ```

许可证：CC BY 4.0 语言： - 英语规模类别： - 100B < n < 1T # WavCaps WavCaps是一个由ChatGPT辅助的弱标注音频字幕数据集，面向音频-语言多模态研究。其音频片段来源于三个网站：FreeSound（FreeSound）、BBC Sound Effects（BBC Sound Effects）以及SoundBible（SoundBible），以及一个声音事件检测数据集——AudioSet强标注子集（AudioSet Strongly-labelled Subset）。 - **论文：** https://arxiv.org/abs/2303.17395 - **Github仓库：** https://github.com/XinhaoMei/WavCaps ## 统计信息 | 数据来源 | 音频数量 | 平均音频时长（秒） | 平均文本长度 | |--------------------------|----------|---------------------|--------------| | FreeSound | 262300 | 85.98 | 6.77 | | BBC Sound Effects | 31201 | 115.04 | 9.67 | | SoundBible | 1232 | 13.12 | 5.87 | | AudioSet强标注子集 | 108317 | 10.00 | 9.79 | | WavCaps总数据集 | 403050 | 67.59 | 7.80 | ## 下载说明我们为每个数据来源提供了一个JSON文件。对于来源于上述网站的音频片段，我们提供了处理后的字幕、原始描述以及其他元数据。对于来自AudioSet的音频片段，我们采用了PANNs中的版本，每个文件名的开头均追加了字符'Y'。关于音频的起始时间，请参考AudioSet强标注子集（AudioSet Strongly-labelled Subset）的原始元数据。 FLAC格式的音频波形文件可通过[Zip_files](https://huggingface.co/datasets/cvssp/WavCaps/tree/main/Zip_files)目录下载。预训练模型可在此处获取：https://drive.google.com/drive/folders/1pFr8IRY3E1FAtc2zjYmeuSVY3M5a-Kdj?usp=share_link。 <font color='red'>若您在解压时遇到"error: invalid zip file with overlapped components (possible zip bomb)"错误，请尝试以下命令：</font> `zip -F AudioSet_SL.zip --out AS.zip` `unzip AS.zip` ## 许可证说明 WavCaps数据集仅可用于学术用途。通过JSON文件中提供的链接下载音频片段，即表示您同意仅将音频用于研究目的。对于来自FreeSound的音频片段，请参考其自身页面注明原作者与来源。详细许可证信息请参阅：[FreeSound](https://freesound.org/help/faq/#licenses)、[BBC Sound Effects](https://sound-effects.bbcrewind.co.uk/licensing)、[SoundBible](https://soundbible.com/about.php)。我们提供的预训练模型是根据英国数据版权豁免条款，为非商业研究场景创建的。 ## 相关任务代码我们为音频-语言检索、自动音频字幕以及零样本音频分类(Zero-shot)任务提供了代码与预训练模型。 * [检索任务代码](https://github.com/XinhaoMei/WavCaps/tree/master/retrieval) * [字幕生成任务代码](https://github.com/XinhaoMei/WavCaps/tree/master/captioning) * [零样本音频分类代码](https://github.com/XinhaoMei/WavCaps/blob/master/retrieval/zero_shot_classification.py) * [文本到声音生成(AudioLDM)](https://github.com/haoheliu/AudioLDM) ## 引用方式若您使用本数据集，请引用以下文献： bibtex @article{mei2023wavcaps, title={WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research}, author={Mei, Xinhao and Meng, Chutong and Liu, Haohe and Kong, Qiuqiang and Ko, Tom and Zhao, Chengqi and Plumbley, Mark D and Zou, Yuexian and Wang, Wenwu}, journal={arXiv preprint arXiv:2303.17395}, year={2023} }

提供机构：

cvssp

原始信息汇总

WavCaps 数据集概述

数据集描述

WavCaps 是一个由 ChatGPT 辅助的弱标签音频描述数据集，用于音频-语言多模态研究。音频片段来源于以下四个来源：

数据集统计信息

数据来源	音频数量	平均音频时长 (秒)	平均文本长度
FreeSound	262300	85.98	6.77
BBC Sound Effects	31201	115.04	9.67
SoundBible	1232	13.12	5.87
AudioSet SL subset	108317	10.00	9.79
WavCaps	403050	67.59	7.80

下载信息

数据集提供了每个数据源的 JSON 文件。对于来自网站的音频片段，提供了处理后的描述、原始描述以及其他元数据。对于来自 AudioSet 的音频片段，使用了 PANNs 版本，每个文件名前缀为 Y。音频文件以 FLAC 格式提供，可通过 Zip_files 目录下载。

许可证

WavCaps 数据集仅允许学术用途。通过提供的链接下载音频片段时，您同意仅将音频用于研究目的。详细许可证信息请参考以下链接：

引用

如果您使用该数据集，请引用以下论文： bibtex @article{mei2023wavcaps, title={WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research}, author={Mei, Xinhao and Meng, Chutong and Liu, Haohe and Kong, Qiuqiang and Ko, Tom and Zhao, Chengqi and Plumbley, Mark D and Zou, Yuexian and Wang, Wenwu}, journal={arXiv preprint arXiv:2303.17395}, year={2023} }

搜集汇总

数据集介绍

构建方式

WavCaps数据集的构建依托于ChatGPT辅助的弱标签音频描述技术，音频片段来源于三个网站（FreeSound、BBC Sound Effects和SoundBible）以及一个声音事件检测数据集（AudioSet强标签子集）。数据集的构建过程中，每个音频片段均配备了相应的描述文本，这些文本不仅包括处理后的描述，还保留了原始描述及其他元数据。此外，AudioSet的音频片段采用了PANNs版本，文件名前缀为'Y'，并附带了原始的开始时间信息。

使用方法

WavCaps数据集适用于多种音频语言多模态研究任务，包括音频语言检索、自动音频描述生成和零样本音频分类。研究者可以通过提供的json文件访问音频片段及其相关描述，并利用预训练模型进行进一步的研究。数据集的音频文件可通过指定的链接下载，且提供了处理音频文件的特定命令，以应对可能的解压错误。

背景与挑战

背景概述

WavCaps数据集是由ChatGPT辅助构建的弱标签音频描述数据集，专为音频-语言多模态研究设计。该数据集汇集了来自FreeSound、BBC Sound Effects、SoundBible以及AudioSet强标签子集的音频片段，涵盖了广泛的音频内容。其核心研究问题在于通过弱标签的音频描述，推动音频与语言多模态研究的发展。WavCaps由Xinhao Mei等研究人员于2023年提出，其研究成果发表在arXiv预印本平台上，对音频描述和多模态学习领域产生了重要影响。

当前挑战

WavCaps数据集在构建过程中面临多重挑战。首先，音频来源多样，涵盖了不同平台和数据集，导致音频质量和格式的不一致性，增加了数据预处理的复杂性。其次，弱标签的音频描述需要通过ChatGPT进行辅助生成，确保描述的准确性和多样性，这对模型的生成能力和数据标注的可靠性提出了高要求。此外，音频与文本的多模态对齐问题也是研究中的难点，如何在不同模态之间建立有效的语义联系，是该数据集面临的主要挑战之一。

常用场景

经典使用场景

WavCaps数据集在音频-语言多模态研究领域中具有广泛的应用，尤其在音频描述生成和音频检索任务中表现突出。通过结合来自多个数据源的音频片段及其对应的弱标签描述，WavCaps为研究人员提供了一个丰富的资源库，用于训练和评估音频描述生成模型。此外，该数据集还支持零样本音频分类任务，使得模型能够在未见过的音频类别上进行分类。

解决学术问题

WavCaps数据集解决了音频-语言多模态研究中的关键问题，特别是在弱标签数据的有效利用和音频描述生成的准确性方面。通过引入ChatGPT辅助生成的弱标签描述，WavCaps显著提升了音频描述的质量和多样性，为音频描述生成和音频检索任务提供了更为可靠的训练数据。这一创新不仅推动了音频-语言多模态研究的发展，还为相关领域的学术研究提供了新的思路和方法。

实际应用

WavCaps数据集在实际应用中具有广泛的前景，特别是在智能音频处理和多媒体内容分析领域。例如，在智能音频检索系统中，WavCaps可以用于训练高效的音频描述生成模型，从而实现基于文本的音频检索。此外，该数据集还可应用于自动音频描述生成系统，为视障用户提供音频内容的文字描述，提升用户体验。

数据集最近研究

cvssp/WavCaps

WavCaps 数据集概述

数据集描述

数据集统计信息

下载信息

许可证

相关代码

引用