Dataset for Speech Tampering Detection and Classification

Name: Dataset for Speech Tampering Detection and Classification
Creator: Science Data Bank
Published: 2026-04-28 02:09:19
License: 暂无描述

DataCite Commons2026-04-28 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=f58f5237063f4ae599114b2e1d7a488d

下载链接

链接失效反馈

官方服务：

资源简介：

The careful configurations for both training and testing sets, incorporating MSFs and their corresponding GTMs. Note that each sample contains a set of MSFs and the corresponding GTM.For DET-net training on Chinese speech data, we utilized 5,000 authentic samples and 20,000 tampered samples (5,000 per manipulation type), with 800 authentic and 3,200 tampered samples (800 per type) reserved for evaluation. For English speech data, the training set consisted of 4,500 authentic and 18,000 tampered samples (4,500 per type), while the evaluation set contained 800 authentic and 3,200 tampered samples (800 per type). Note that for testing purposes, we constructed a balanced testing set comprising 800 authentic speech samples and 800 manipulated audio samples of a specific tampering type, which were then processed through the network for evaluation.For CLF-net training, we employed identical sample distributions for both Chinese and English speech data: 3,000 authentic and 12,000 tampered samples (3,000 per manipulation type) for training, with 1,000 authentic and 4,000 tampered samples (1,000 per type) allocated for evaluation. Notice that each sample is labeled with an integer (0-4) representing its category: non-tampering, copy-move forgery, deletion, homologous splicing, or heterologous splicing, respectively. In contrast to the testing approach used in DET-net, CLF-net is evaluated with a batch of 5,000 samples (1,000 authentic and 4,000 tampered) processed simultaneously.To evaluate generalization, we additionally included a limited set of supplementary samples covering Spanish speech, AI-synthesized and authentic audio splicing, and real-world tampered recordings for extended robustness testing.

提供机构：

Science Data Bank

创建时间：

2026-02-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集