five

Dataset for Speech Tampering Detection and Classification

收藏
DataCite Commons2026-04-28 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=f58f5237063f4ae599114b2e1d7a488d
下载链接
链接失效反馈
官方服务:
资源简介:
The careful configurations for both training and testing sets, incorporating MSFs and their corresponding GTMs. Note that each sample contains a set of MSFs and the corresponding GTM.For DET-net training on Chinese speech data, we utilized 5,000 authentic samples and 20,000 tampered samples (5,000 per manipulation type), with 800 authentic and 3,200 tampered samples (800 per type) reserved for evaluation. For English speech data, the training set consisted of 4,500 authentic and 18,000 tampered samples (4,500 per type), while the evaluation set contained 800 authentic and 3,200 tampered samples (800 per type). Note that for testing purposes, we constructed a balanced testing set comprising 800 authentic speech samples and 800 manipulated audio samples of a specific tampering type, which were then processed through the network for evaluation.For CLF-net training, we employed identical sample distributions for both Chinese and English speech data: 3,000 authentic and 12,000 tampered samples (3,000 per manipulation type) for training, with 1,000 authentic and 4,000 tampered samples (1,000 per type) allocated for evaluation. Notice that each sample is labeled with an integer (0-4) representing its category: non-tampering, copy-move forgery, deletion, homologous splicing, or heterologous splicing, respectively. In contrast to the testing approach used in DET-net, CLF-net is evaluated with a batch of 5,000 samples (1,000 authentic and 4,000 tampered) processed simultaneously.To evaluate generalization, we additionally included a limited set of supplementary samples covering Spanish speech, AI-synthesized and authentic audio splicing, and real-world tampered recordings for extended robustness testing.
提供机构:
Science Data Bank
创建时间:
2026-02-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作