WaveFake

Name: WaveFake
Creator: OpenDataLab
Published: 2026-05-17 04:30:28
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/WaveFake

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集由 104,885 个生成的音频剪辑（16 位 PCM wav）组成。我们检查了在两个参考数据集上训练的多个网络。首先，LJSpeech 数据集包含 13,100 个短音频片段（平均每个 6 秒；总共大约 24 小时），由一位女性说话者朗读。它包含来自 7 部非小说类书籍的段落，音频录制在 MacBook Pro 麦克风上。其次，我们包括基于 JSUT 数据集的样本，特别是 basic5000 语料库。该语料库由 5,000 个句子组成，涵盖了日语的所有基本汉字（平均 4.8 秒；总共大约 6.7 小时）。录音是由一位以日语为母语的女性在无回声的房间里录制的。最后，我们包括来自完整文本到语音管道的样本（16,283 个短语；平均 3.8 秒；总共大约 17.5 小时）。因此，我们的数据集总共包含大约 175 小时的生成音频文件。请注意，我们不会重新分配参考数据。

This dataset consists of 104,885 generated audio clips in 16-bit PCM WAV format. We examined multiple networks trained on two reference datasets. First, the LJSpeech dataset contains 13,100 short audio segments with an average duration of 6 seconds and a total of approximately 24 hours, read by a single female speaker. It includes passages from seven non-fiction books, with audio recorded using a MacBook Pro microphone. Second, we included samples based on the JSUT dataset, specifically the basic5000 corpus. This corpus is composed of 5,000 sentences covering all basic Japanese kanji, with an average duration of 4.8 seconds and a total of approximately 6.7 hours. The recordings were made by a native Japanese female speaker in an anechoic chamber. Finally, we included samples from a complete text-to-speech pipeline: 16,283 phrases with an average duration of 3.8 seconds and a total of approximately 17.5 hours. Thus, our dataset contains a total of approximately 175 hours of generated audio files. Please note that we do not redistribute the reference data.

提供机构：

OpenDataLab

创建时间：

2022-09-01

搜集汇总

数据集介绍