five

SPEECHFAKE

收藏
魔搭社区2026-05-13 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/inclusionAI/SPEECHFAKE
下载链接
链接失效反馈
官方服务:
资源简介:
# SpeechFake Copyright © 2025 Ant Group ## 1. Dataset Structure ``` SpeechFake/ |- BD/ | |- BigVGAN # Bilingual Dataset | | |- xxx/xxx.wav | | |- ... | |- ... |- MD/ # Multilingual Dataset | |- CosyVoice | | |- xxx/xxx.wav | | |- ... | |- ... |- Real/ # Real Dataset | |- Aishell1 | | |- xxx/xxx.wav | | |- ... | |- ... | |- metadata/ | |- BD/ | | |- TTS_xxx.csv # Metadata of each generator | | |- ... | | | |- MD/ | | |- TTS_xxx_en.csv # Metadata of each generator for each language | | |- ... | | | |- Real/ | | |- Aishell1.csv # Metadata of each real datasets | | |- ... | | | |- experiments/ # Metadata of train/dev/test data used in experiments | |- baseline/ | | |- train_all.csv | | |- dev_all.csv | | |- test_all.csv | | |- ... | | | |- cross_generator/ | | |- train_tts.csv | | |- dev_tts.csv | | |- test_tts.csv | | |- ... | | | |- cross_lingual/ | | |- train.csv | | |- dev.csv | | |- test_en.csv | | |- ... | | | |- cross_speaker/ | |- train.csv | |- test_same_spk.csv | |- test_diff_spk.csv | |- ... | |- LICENSE.txt |- README.md ``` ## 2. Audio Description All audio files are stored in WAV format at 16 kHz sampling rate. - `BD/`: Contains bilingual speech deepfakes and real audio in English and Chinese. - `MD/`: Contains multilingual speech deepfakes in 46 languages. - `Real/`: Contains real speech data sourced from LibriTTS, VCTK, AISHELL1, AISHELL3, and CommonVoice. For detailed information and file lists, refer to the `metadata/` directory. ## 3. Metadata Format All metadata files are provided in CSV format with the following columns: 1) `file`: Relative file path to the audio 2) `label`: `bonafide` or `spoof` 3) `generator`: `TTS`, `VC`, or `NV` 4) `model`: Name of the speech generation model 5) `speaker`: Speaker identity 6) `language`: Language code (e.g., `en`, `zh`, `es`, etc.) For the data structure of metadata: - `metadata/BD` and `metadata/MD` include metadata for each speech generator. - `metadata/Real` includes metadata for each real datasets. - `metadata/experiments` contains metadata for training, development, and testing splits used in various experiments described in the paper, including: - baseline - cross-generator - cross-lingual - cross-speaker ## 4. License The SpeechFake dataset is released under the CC-BY-4.0 License. Please read `LICENSE.txt` for full details. ## 5. Citation If you use this dataset in your work, please cite the following paper: ``` @inproceedings{huang2025speechfake, title={SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods}, author={Huang, Wen and Gu, Yanmei and Wang, Zhiming and Zhu, Huijia and Qian, Yanmin}, booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages={9985--9998}, year={2025} } ```

# 语音伪造(SpeechFake) 版权所有 © 2025 蚂蚁集团(Ant Group) ## 1. 数据集结构 SpeechFake/ |- BD/ | |- BigVGAN # 双语数据集 | | |- xxx/xxx.wav | | |- ... | |- ... |- MD/ # 多语言数据集 | |- CosyVoice | | |- xxx/xxx.wav | | |- ... | |- ... |- Real/ # 真实数据集 | |- Aishell1 | | |- xxx/xxx.wav | | |- ... | |- ... | |- metadata/ | |- BD/ | | |- TTS_xxx.csv # 各生成器的元数据 | | |- ... | | | |- MD/ | | |- TTS_xxx_en.csv # 各语言对应生成器的元数据 | | |- ... | | | |- Real/ | | |- Aishell1.csv # 各真实数据集的元数据 | | |- ... | | | |- experiments/ # 本论文所述各类实验所用训练/验证/测试数据的元数据 | |- baseline/ | | |- train_all.csv | | |- dev_all.csv | | |- test_all.csv | | |- ... | | | |- cross_generator/ | | |- train_tts.csv | | |- dev_tts.csv | | |- test_tts.csv | | |- ... | | | |- cross_lingual/ | | |- train.csv | | |- dev.csv | | |- test_en.csv | | |- ... | | | |- cross_speaker/ | |- train.csv | |- test_same_spk.csv | |- test_diff_spk.csv | |- ... | |- LICENSE.txt |- README.md ## 2. 音频说明 所有音频文件均采用WAV格式存储,采样率为16 kHz。 - `BD/`:包含英语与汉语的双语语音深度伪造音频及真实音频。 - `MD/`:包含46种语言的多语言语音深度伪造音频。 - `Real/`:包含源自LibriTTS、VCTK、AISHELL1、AISHELL3以及CommonVoice的真实语音数据。 如需获取详细信息与文件列表,请参阅`metadata/`目录。 ## 3. 元数据格式 所有元数据文件均采用CSV格式,包含以下字段: 1. `file`:音频文件的相对路径 2. `label`:标注为`bonafide`(真实音频)或`spoof`(伪造音频) 3. `generator`:类型为`TTS`(文本转语音)、`VC`(语音转换)或`NV`(神经声码器) 4. `model`:语音生成模型的名称 5. `speaker`:说话人身份 6. `language`:语言代码(例如`en`、`zh`、`es`等) 关于元数据的目录结构: - `metadata/BD` 与 `metadata/MD` 包含各语音生成器的元数据。 - `metadata/Real` 包含各真实数据集的元数据。 - `metadata/experiments` 包含论文中所述各类实验所用的训练、验证与测试划分的元数据,具体包括: - 基线实验 - 跨生成器实验 - 跨语言实验 - 跨说话人实验 ## 4. 许可协议 语音伪造(SpeechFake)数据集采用CC-BY-4.0许可协议发布。 如需获取完整条款,请阅读`LICENSE.txt`文件。 ## 5. 引用方式 若您在研究工作中使用本数据集,请引用以下论文: @inproceedings{huang2025speechfake, title={SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods}, author={Huang, Wen and Gu, Yanmei and Wang, Zhiming and Zhu, Huijia and Qian, Yanmin}, booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages={9985--9998}, year={2025} }
提供机构:
maas
创建时间:
2025-07-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作