Izzyzlin/CFSDD
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Izzyzlin/CFSDD
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
pretty_name: CFSDD
language:
- zh
tags:
- audio
- text
- speech-deepfake-detection
- telecom-fraud
size_categories:
- 100K<n<1M
configs:
- config_name: main
default: true
data_files:
- split: train
path: data/train/*.parquet
- split: dev
path: data/dev/*.parquet
- split: test
path:
- data/test_clean/*.parquet
- data/test_noise/*.parquet
- data/test_ns/*.parquet
- data/test_codec/*.parquet
dataset_info:
- config_name: main
features:
- name: key
dtype: string
- name: audio
dtype: audio
- name: text
dtype: string
- name: file_path
dtype: string
- name: speaker
dtype: string
- name: gender
dtype: string
- name: method
dtype: string
- name: label
dtype: string
---
# CFSDD Dataset
## 📌 Overview
CFSDD is a Chinese speech deepfake benchmark for telecom fraud scenarios. Unlike conventional speech deepfake datasets that mainly focus on acoustic authenticity, CFSDD is designed from a risk-oriented perspective and jointly considers both acoustic authenticity and semantic intent. The benchmark explicitly distinguishes real benign speech from fake fraudulent speech, making it suitable for studying speech deepfake detection under realistic telecom-fraud conditions.
It contains two classes of samples:
- **Real benign speech**: real speech with benign semantic content, drawn from [MagicData-RAMC](https://github.com/MagicHub-io/MagicData-RAMC)
- **Fake fraudulent speech**: synthesized speech with fraudulent content, whose transcripts are derived from [TeleAntiFraud-28k](https://github.com/JimmyMa99/TeleAntiFraud)
## 🗂️ Split Organization
This repository is organized into three dataset splits:
- `train`: training split
- `dev`: development split
- `test`: test split
The `test` split aggregates four evaluation subsets:
- `test_clean`: clean test data
- `test_noise`: output after **noise addition**
- `test_ns`: output after **noise suppression**
- `test_codec`: output after **codec processing**
This organization follows the disturbance-oriented evaluation protocol described in the paper. The clean test data are further expanded through a sequential pipeline consisting of **noise addition**, **noise suppression**, and **codec processing** to better approximate realistic voice-call conditions.
To simulate realistic call conditions, the disturbance-oriented test subsets use the following external resources:
- `Noise addition`: [Audioset](https://research.google.com/audioset), [Freesound](https://freesound.org/), [RIR](http://www.openslr.org/28/) (following [ICASSP 2023 Deep Noise Suppression Challenge](https://github.com/microsoft/dns-challenge))
- `Noise suppression`: [DeepFilterNet](https://github.com/Rikorose/DeepFilterNet)
- `Codec processing`: [Opus](https://github.com/xiph/opus)
## 📊 Dataset Statistics
CFSDD contains 766 hours of speech in total, with 394,908 utterances, 663 speakers, an average utterance duration of 6.98 seconds, and 10 TTS systems. The speakers in the train, development, and test splits are strictly disjoint. Five TTS systems appear in all splits, while the remaining five are reserved for the test split to evaluate generalization to unseen generators.
**The detailed statistics for each split:**
<div align="left">
<table>
<thead>
<tr>
<th>Split</th>
<th>Duration (h)</th>
<th># Utterances</th>
<th># Speakers</th>
<th># Systems</th>
</tr>
</thead>
<tbody>
<tr>
<td>Train</td>
<td align="right">139</td>
<td align="right">74,737</td>
<td align="right">300</td>
<td align="right">5</td>
</tr>
<tr>
<td>Dev</td>
<td align="right">49</td>
<td align="right">25,893</td>
<td align="right">100</td>
<td align="right">5</td>
</tr>
<tr>
<td>Test</td>
<td align="right">578</td>
<td align="right">294,278</td>
<td align="right">263</td>
<td align="right">10</td>
</tr>
<tr>
<td>Total</td>
<td align="right">766</td>
<td align="right">394,908</td>
<td align="right">663</td>
<td align="right">10</td>
</tr>
</tbody>
</table>
</div>
**Distribution by class, gender, and test conditions:**
<p align="left">
<img src="assets/distribution1.png" alt="CFSDD distribution" width="450">
</p>
**Distribution of TTS systems and utterance durations:**
<p align="left">
<img src="assets/distribution2.png" alt="CFSDD generation and duration statistics" width="450">
</p>
## 💻 Example Usage
```python
from datasets import load_dataset
ds = load_dataset("Izzyzlin/CFSDD", "main")
train_ds = ds["train"]
dev_ds = ds["dev"]
test_ds = ds["test"]
print(train_ds[0])
print(dev_ds[0])
print(test_ds[0])
test_clean = test_ds.filter(lambda x: "test_clean" in x["key"])
test_noise = test_ds.filter(lambda x: "test_noise" in x["key"])
test_ns = test_ds.filter(lambda x: "test_ns" in x["key"])
test_codec = test_ds.filter(lambda x: "test_codec" in x["key"])
print(test_clean[0])
print(test_noise[0])
print(test_ns[0])
print(test_codec[0])
```
## 🙏 Acknowledgements
CFSDD is built on top of valuable public resources. If you use this dataset, please also consider citing the original data sources and the TTS systems used to construct the fake fraudulent speech.
### 📚 Data Sources
```bibtex
@article{yang2022open,
title={Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset},
author={Yang, Zehui and Chen, Yifan and Luo, Lei and Yang, Runyan and Ye, Lingxuan and Cheng, Gaofeng and Xu, Ji and Jin, Yaohui and Zhang, Qingqing and Zhang, Pengyuan and others},
journal={arXiv preprint arXiv:2203.16844},
year={2022}
}
@inproceedings{ma2025teleantifraud,
title={TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection},
author={Ma, Zhiming and Wang, Peidong and Huang, Minhua and Wang, Jinpeng and Wu, Kai and Lv, Xiangzhao and Pang, Yachun and Yang, Yin and Tang, Wenjie and Kang, Yuchen},
booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
pages={5853--5862},
year={2025}
}
```
### 🤖 TTS Systems
```bibtex
@inproceedings{chen2025f5,
title={F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching},
author={Chen, Yushen and Niu, Zhikang and Ma, Ziyang and Deng, Keqi and Wang, Chunhui and JianZhao, JianZhao and Yu, Kai and Chen, Xie},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={6255--6271},
year={2025}
}
@article{voxcpm2025,
title={VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning},
author={Zhou, Yixuan and Zeng, Guoyang and Liu, Xin and Li, Xiang and Yu, Renjie and Wang, Ziyang and Ye, Runchuan and Sun, Weiyue and Gui, Jiancheng and Li, Kehan and Wu, Zhiyong and Liu, Zhiyuan},
journal={arXiv preprint arXiv:2509.24650},
year={2025}
}
@article{zhu2025zipvoice,
title={Zipvoice: Fast and high-quality zero-shot text-to-speech with flow matching},
author={Zhu, Han and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Li, Zhaoqing and Zhuang, Weiji and Lin, Long and Povey, Daniel},
journal={arXiv preprint arXiv:2506.13053},
year={2025}
}
@article{zhou2025indextts2,
title={IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech},
author={Zhou, Siyi and Zhou, Yiquan and He, Yi and Zhou, Xun and Wang, Jinchao and Deng, Wei and Shu, Jingchen},
journal={arXiv preprint arXiv:2506.21619},
year={2025}
}
@article{wang2025spark,
title={Spark-tts: An efficient llm-based text-to-speech model with single-stream decoupled speech tokens},
author={Wang, Xinsheng and Jiang, Mingqi and Ma, Ziyang and Zhang, Ziyu and Liu, Songxiang and Li, Linqin and Liang, Zheng and Zheng, Qixi and Wang, Rui and Feng, Xiaoqin and others},
journal={arXiv preprint arXiv:2503.01710},
year={2025}
}
@article{du2025cosyvoice,
title={CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training},
author={Du, Zhihao and Gao, Changfeng and Wang, Yuxuan and Yu, Fan and Zhao, Tianyu and Wang, Hao and Lv, Xiang and Wang, Hui and Shi, Xian and An, Keyu and others},
journal={arXiv preprint arXiv:2505.17589},
year={2025}
}
@article{cui2025glm,
title={Glm-tts technical report},
author={Cui, Jiayan and Yang, Zhihan and Li, Naihan and Tian, Jiankun and Ma, Xingyu and Zhang, Yi and Chen, Guangyu and Yang, Runxuan and Cheng, Yuqing and Zhou, Yizhi and others},
journal={arXiv preprint arXiv:2512.14291},
year={2025}
}
@article{hu2026qwen3,
title={Qwen3-TTS Technical Report},
author={Hu, Hangrui and Zhu, Xinfa and He, Ting and Guo, Dake and Zhang, Bin and Wang, Xiong and Guo, Zhifang and Jiang, Ziyue and Hao, Hongkun and Guo, Zishan and others},
journal={arXiv preprint arXiv:2601.15621},
year={2026}
}
@article{xie2025fireredtts,
title={Fireredtts-2: Towards long conversational speech generation for podcast and chatbot},
author={Xie, Kun and Shen, Feiyu and Li, Junjie and Xie, Fenglong and Tang, Xu and Hu, Yao},
journal={arXiv preprint arXiv:2509.02020},
year={2025}
}
@article{liao2026fish,
title={Fish Audio S2 Technical Report},
author={Liao, Shijia and Wang, Yuxuan and Liu, Songting and Cheng, Yifan and Zhang, Ruoyi and Li, Tianyu and Li, Shidong and Zheng, Yisheng and Liu, Xingwei and Wang, Qingzheng and others},
journal={arXiv preprint arXiv:2603.08823},
year={2026}
}
```
提供机构:
Izzyzlin



