SPEECHFAKE
收藏魔搭社区2026-05-13 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/inclusionAI/SPEECHFAKE
下载链接
链接失效反馈官方服务:
资源简介:
# SpeechFake
Copyright © 2025 Ant Group
## 1. Dataset Structure
```
SpeechFake/
|- BD/
| |- BigVGAN # Bilingual Dataset
| | |- xxx/xxx.wav
| | |- ...
| |- ...
|- MD/ # Multilingual Dataset
| |- CosyVoice
| | |- xxx/xxx.wav
| | |- ...
| |- ...
|- Real/ # Real Dataset
| |- Aishell1
| | |- xxx/xxx.wav
| | |- ...
| |- ...
|
|- metadata/
| |- BD/
| | |- TTS_xxx.csv # Metadata of each generator
| | |- ...
| |
| |- MD/
| | |- TTS_xxx_en.csv # Metadata of each generator for each language
| | |- ...
| |
| |- Real/
| | |- Aishell1.csv # Metadata of each real datasets
| | |- ...
| |
| |- experiments/ # Metadata of train/dev/test data used in experiments
| |- baseline/
| | |- train_all.csv
| | |- dev_all.csv
| | |- test_all.csv
| | |- ...
| |
| |- cross_generator/
| | |- train_tts.csv
| | |- dev_tts.csv
| | |- test_tts.csv
| | |- ...
| |
| |- cross_lingual/
| | |- train.csv
| | |- dev.csv
| | |- test_en.csv
| | |- ...
| |
| |- cross_speaker/
| |- train.csv
| |- test_same_spk.csv
| |- test_diff_spk.csv
| |- ...
|
|- LICENSE.txt
|- README.md
```
## 2. Audio Description
All audio files are stored in WAV format at 16 kHz sampling rate.
- `BD/`: Contains bilingual speech deepfakes and real audio in English and Chinese.
- `MD/`: Contains multilingual speech deepfakes in 46 languages.
- `Real/`: Contains real speech data sourced from LibriTTS, VCTK, AISHELL1, AISHELL3, and CommonVoice.
For detailed information and file lists, refer to the `metadata/` directory.
## 3. Metadata Format
All metadata files are provided in CSV format with the following columns:
1) `file`: Relative file path to the audio
2) `label`: `bonafide` or `spoof`
3) `generator`: `TTS`, `VC`, or `NV`
4) `model`: Name of the speech generation model
5) `speaker`: Speaker identity
6) `language`: Language code (e.g., `en`, `zh`, `es`, etc.)
For the data structure of metadata:
- `metadata/BD` and `metadata/MD` include metadata for each speech generator.
- `metadata/Real` includes metadata for each real datasets.
- `metadata/experiments` contains metadata for training, development, and testing splits used in various experiments described in the paper, including:
- baseline
- cross-generator
- cross-lingual
- cross-speaker
## 4. License
The SpeechFake dataset is released under the CC-BY-4.0 License.
Please read `LICENSE.txt` for full details.
## 5. Citation
If you use this dataset in your work, please cite the following paper:
```
@inproceedings{huang2025speechfake,
title={SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods},
author={Huang, Wen and Gu, Yanmei and Wang, Zhiming and Zhu, Huijia and Qian, Yanmin},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={9985--9998},
year={2025}
}
```
# 语音伪造(SpeechFake)
版权所有 © 2025 蚂蚁集团(Ant Group)
## 1. 数据集结构
SpeechFake/
|- BD/
| |- BigVGAN # 双语数据集
| | |- xxx/xxx.wav
| | |- ...
| |- ...
|- MD/ # 多语言数据集
| |- CosyVoice
| | |- xxx/xxx.wav
| | |- ...
| |- ...
|- Real/ # 真实数据集
| |- Aishell1
| | |- xxx/xxx.wav
| | |- ...
| |- ...
|
|- metadata/
| |- BD/
| | |- TTS_xxx.csv # 各生成器的元数据
| | |- ...
| |
| |- MD/
| | |- TTS_xxx_en.csv # 各语言对应生成器的元数据
| | |- ...
| |
| |- Real/
| | |- Aishell1.csv # 各真实数据集的元数据
| | |- ...
| |
| |- experiments/ # 本论文所述各类实验所用训练/验证/测试数据的元数据
| |- baseline/
| | |- train_all.csv
| | |- dev_all.csv
| | |- test_all.csv
| | |- ...
| |
| |- cross_generator/
| | |- train_tts.csv
| | |- dev_tts.csv
| | |- test_tts.csv
| | |- ...
| |
| |- cross_lingual/
| | |- train.csv
| | |- dev.csv
| | |- test_en.csv
| | |- ...
| |
| |- cross_speaker/
| |- train.csv
| |- test_same_spk.csv
| |- test_diff_spk.csv
| |- ...
|
|- LICENSE.txt
|- README.md
## 2. 音频说明
所有音频文件均采用WAV格式存储,采样率为16 kHz。
- `BD/`:包含英语与汉语的双语语音深度伪造音频及真实音频。
- `MD/`:包含46种语言的多语言语音深度伪造音频。
- `Real/`:包含源自LibriTTS、VCTK、AISHELL1、AISHELL3以及CommonVoice的真实语音数据。
如需获取详细信息与文件列表,请参阅`metadata/`目录。
## 3. 元数据格式
所有元数据文件均采用CSV格式,包含以下字段:
1. `file`:音频文件的相对路径
2. `label`:标注为`bonafide`(真实音频)或`spoof`(伪造音频)
3. `generator`:类型为`TTS`(文本转语音)、`VC`(语音转换)或`NV`(神经声码器)
4. `model`:语音生成模型的名称
5. `speaker`:说话人身份
6. `language`:语言代码(例如`en`、`zh`、`es`等)
关于元数据的目录结构:
- `metadata/BD` 与 `metadata/MD` 包含各语音生成器的元数据。
- `metadata/Real` 包含各真实数据集的元数据。
- `metadata/experiments` 包含论文中所述各类实验所用的训练、验证与测试划分的元数据,具体包括:
- 基线实验
- 跨生成器实验
- 跨语言实验
- 跨说话人实验
## 4. 许可协议
语音伪造(SpeechFake)数据集采用CC-BY-4.0许可协议发布。
如需获取完整条款,请阅读`LICENSE.txt`文件。
## 5. 引用方式
若您在研究工作中使用本数据集,请引用以下论文:
@inproceedings{huang2025speechfake,
title={SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods},
author={Huang, Wen and Gu, Yanmei and Wang, Zhiming and Zhu, Huijia and Qian, Yanmin},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={9985--9998},
year={2025}
}
提供机构:
maas
创建时间:
2025-07-16



