FAD: A Chinese Dataset for Fake Audio Detection

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/6623226

下载链接

链接失效反馈

官方服务：

资源简介：

Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model. For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2. Clean Real Audios Collection From the point of eliminating the interference of irrelevant factors, we collect clean real audios from two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset. Clean Fake Audios Generation We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios. Noisy Audios Simulation Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes. This data set is licensed with a CC BY-NC-ND 4.0 license. You can cite the data using the following BibTeX entry: @inproceedings{ma2022fad, title={FAD: A Chinese Dataset for Fake Audio Detection}, author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu}, booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks }, year={2022}, }

虚假音频检测是当前日益受到关注的研究方向，已有若干相关数据集供科研使用，但目前尚无公开可用的加噪场景下标准中文虚假音频检测数据集。本文旨在填补这一空白，构建一款面向通用化检测方法研究的中文虚假音频检测数据集（Fake Audio Detection, FAD）。我们采用12种主流语音生成技术制作虚假音频样本。为模拟真实应用场景，我们选取3个噪声数据集，以5种不同信噪比（Signal Noise Ratio, SNR）为样本添加噪声。本数据集不仅可用于虚假音频检测任务，还可应用于音频取证场景下的虚假语音算法识别任务。本文同时提供基线模型实验结果并进行分析，实验结果表明，具备泛化能力的虚假音频检测方法仍具有较大研究挑战。本FAD数据集已公开，基线模型的源代码可在GitHub平台获取：https://github.com/ADDchallenge/FAD 本FAD数据集旨在用于评估虚假音频检测、虚假语音算法识别等相关研究方法。为更好地研究模型在真实应用加噪场景下的鲁棒性，我们同时构建了对应加噪版本数据集。整体FAD数据集包含纯净版与加噪版两个版本，两个版本均采用相同的划分规则，被划分为互不重叠的训练集、开发集与测试集，且三个子集之间无说话人重叠。每个测试集进一步分为可见测试集与不可见测试集：不可见测试集可用于评估模型对未知类型样本的泛化能力，值得注意的是，不可见测试集中的真实音频与虚假音频均为模型从未见过的未知样本。对于加噪语音部分，我们选取3个噪声数据库进行噪声模拟：将纯净数据集中的每条音频以5种不同信噪比添加加性噪声。不可见测试集与其余子集所使用的加性噪声来自不同的噪声数据库。每个版本的FAD数据集具体规模如下：训练集138400条语音、开发集14400条语音、可见测试集42000条语音、不可见测试集21000条语音。更详细的统计信息详见表2。纯净真实音频采集为消除无关因素干扰，我们从两个渠道采集纯净真实音频：来自OpenSLR平台（http://www.openslr.org/12/）的5个开源资源，以及1条自研录制数据集。纯净虚假音频生成我们选取11种典型语音合成方法制作虚假音频，同时额外构建1部分半虚假音频样本。加噪音频模拟加噪音频用于量化模型在噪声场景下的鲁棒性。为模拟真实应用场景，我们人工采集噪声信号，并以0dB、5dB、10dB、15dB、20dB这5种信噪比将噪声添加至纯净音频中。本次实验选取的加性噪声来自3个噪声数据库：PNL 100非语音声音库（PNL 100 Nonspeech Sounds）、NOISEX-92噪声库，以及TAU城市声学场景库（TAU Urban Acoustic Scenes）。本数据集采用CC BY-NC-ND 4.0协议进行授权。可通过以下BibTeX条目引用本数据集： @inproceedings{ma2022fad, title={FAD: A Chinese Dataset for Fake Audio Detection}, author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu}, booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks }, year={2022}, }

创建时间：

2023-07-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集