five

Izzyzlin/CFSDD

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Izzyzlin/CFSDD
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 pretty_name: CFSDD language: - zh tags: - audio - text - speech-deepfake-detection - telecom-fraud size_categories: - 100K<n<1M configs: - config_name: main default: true data_files: - split: train path: data/train/*.parquet - split: dev path: data/dev/*.parquet - split: test path: - data/test_clean/*.parquet - data/test_noise/*.parquet - data/test_ns/*.parquet - data/test_codec/*.parquet dataset_info: - config_name: main features: - name: key dtype: string - name: audio dtype: audio - name: text dtype: string - name: file_path dtype: string - name: speaker dtype: string - name: gender dtype: string - name: method dtype: string - name: label dtype: string --- # CFSDD Dataset ## 📌 Overview CFSDD is a Chinese speech deepfake benchmark for telecom fraud scenarios. Unlike conventional speech deepfake datasets that mainly focus on acoustic authenticity, CFSDD is designed from a risk-oriented perspective and jointly considers both acoustic authenticity and semantic intent. The benchmark explicitly distinguishes real benign speech from fake fraudulent speech, making it suitable for studying speech deepfake detection under realistic telecom-fraud conditions. It contains two classes of samples: - **Real benign speech**: real speech with benign semantic content, drawn from [MagicData-RAMC](https://github.com/MagicHub-io/MagicData-RAMC) - **Fake fraudulent speech**: synthesized speech with fraudulent content, whose transcripts are derived from [TeleAntiFraud-28k](https://github.com/JimmyMa99/TeleAntiFraud) ## 🗂️ Split Organization This repository is organized into three dataset splits: - `train`: training split - `dev`: development split - `test`: test split The `test` split aggregates four evaluation subsets: - `test_clean`: clean test data - `test_noise`: output after **noise addition** - `test_ns`: output after **noise suppression** - `test_codec`: output after **codec processing** This organization follows the disturbance-oriented evaluation protocol described in the paper. The clean test data are further expanded through a sequential pipeline consisting of **noise addition**, **noise suppression**, and **codec processing** to better approximate realistic voice-call conditions. To simulate realistic call conditions, the disturbance-oriented test subsets use the following external resources: - `Noise addition`: [Audioset](https://research.google.com/audioset), [Freesound](https://freesound.org/), [RIR](http://www.openslr.org/28/) (following [ICASSP 2023 Deep Noise Suppression Challenge](https://github.com/microsoft/dns-challenge)) - `Noise suppression`: [DeepFilterNet](https://github.com/Rikorose/DeepFilterNet) - `Codec processing`: [Opus](https://github.com/xiph/opus) ## 📊 Dataset Statistics CFSDD contains 766 hours of speech in total, with 394,908 utterances, 663 speakers, an average utterance duration of 6.98 seconds, and 10 TTS systems. The speakers in the train, development, and test splits are strictly disjoint. Five TTS systems appear in all splits, while the remaining five are reserved for the test split to evaluate generalization to unseen generators. **The detailed statistics for each split:** <div align="left"> <table> <thead> <tr> <th>Split</th> <th>Duration (h)</th> <th># Utterances</th> <th># Speakers</th> <th># Systems</th> </tr> </thead> <tbody> <tr> <td>Train</td> <td align="right">139</td> <td align="right">74,737</td> <td align="right">300</td> <td align="right">5</td> </tr> <tr> <td>Dev</td> <td align="right">49</td> <td align="right">25,893</td> <td align="right">100</td> <td align="right">5</td> </tr> <tr> <td>Test</td> <td align="right">578</td> <td align="right">294,278</td> <td align="right">263</td> <td align="right">10</td> </tr> <tr> <td>Total</td> <td align="right">766</td> <td align="right">394,908</td> <td align="right">663</td> <td align="right">10</td> </tr> </tbody> </table> </div> **Distribution by class, gender, and test conditions:** <p align="left"> <img src="assets/distribution1.png" alt="CFSDD distribution" width="450"> </p> **Distribution of TTS systems and utterance durations:** <p align="left"> <img src="assets/distribution2.png" alt="CFSDD generation and duration statistics" width="450"> </p> ## 💻 Example Usage ```python from datasets import load_dataset ds = load_dataset("Izzyzlin/CFSDD", "main") train_ds = ds["train"] dev_ds = ds["dev"] test_ds = ds["test"] print(train_ds[0]) print(dev_ds[0]) print(test_ds[0]) test_clean = test_ds.filter(lambda x: "test_clean" in x["key"]) test_noise = test_ds.filter(lambda x: "test_noise" in x["key"]) test_ns = test_ds.filter(lambda x: "test_ns" in x["key"]) test_codec = test_ds.filter(lambda x: "test_codec" in x["key"]) print(test_clean[0]) print(test_noise[0]) print(test_ns[0]) print(test_codec[0]) ``` ## 🙏 Acknowledgements CFSDD is built on top of valuable public resources. If you use this dataset, please also consider citing the original data sources and the TTS systems used to construct the fake fraudulent speech. ### 📚 Data Sources ```bibtex @article{yang2022open, title={Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset}, author={Yang, Zehui and Chen, Yifan and Luo, Lei and Yang, Runyan and Ye, Lingxuan and Cheng, Gaofeng and Xu, Ji and Jin, Yaohui and Zhang, Qingqing and Zhang, Pengyuan and others}, journal={arXiv preprint arXiv:2203.16844}, year={2022} } @inproceedings{ma2025teleantifraud, title={TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection}, author={Ma, Zhiming and Wang, Peidong and Huang, Minhua and Wang, Jinpeng and Wu, Kai and Lv, Xiangzhao and Pang, Yachun and Yang, Yin and Tang, Wenjie and Kang, Yuchen}, booktitle={Proceedings of the 33rd ACM International Conference on Multimedia}, pages={5853--5862}, year={2025} } ``` ### 🤖 TTS Systems ```bibtex @inproceedings{chen2025f5, title={F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching}, author={Chen, Yushen and Niu, Zhikang and Ma, Ziyang and Deng, Keqi and Wang, Chunhui and JianZhao, JianZhao and Yu, Kai and Chen, Xie}, booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages={6255--6271}, year={2025} } @article{voxcpm2025, title={VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning}, author={Zhou, Yixuan and Zeng, Guoyang and Liu, Xin and Li, Xiang and Yu, Renjie and Wang, Ziyang and Ye, Runchuan and Sun, Weiyue and Gui, Jiancheng and Li, Kehan and Wu, Zhiyong and Liu, Zhiyuan}, journal={arXiv preprint arXiv:2509.24650}, year={2025} } @article{zhu2025zipvoice, title={Zipvoice: Fast and high-quality zero-shot text-to-speech with flow matching}, author={Zhu, Han and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Li, Zhaoqing and Zhuang, Weiji and Lin, Long and Povey, Daniel}, journal={arXiv preprint arXiv:2506.13053}, year={2025} } @article{zhou2025indextts2, title={IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech}, author={Zhou, Siyi and Zhou, Yiquan and He, Yi and Zhou, Xun and Wang, Jinchao and Deng, Wei and Shu, Jingchen}, journal={arXiv preprint arXiv:2506.21619}, year={2025} } @article{wang2025spark, title={Spark-tts: An efficient llm-based text-to-speech model with single-stream decoupled speech tokens}, author={Wang, Xinsheng and Jiang, Mingqi and Ma, Ziyang and Zhang, Ziyu and Liu, Songxiang and Li, Linqin and Liang, Zheng and Zheng, Qixi and Wang, Rui and Feng, Xiaoqin and others}, journal={arXiv preprint arXiv:2503.01710}, year={2025} } @article{du2025cosyvoice, title={CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training}, author={Du, Zhihao and Gao, Changfeng and Wang, Yuxuan and Yu, Fan and Zhao, Tianyu and Wang, Hao and Lv, Xiang and Wang, Hui and Shi, Xian and An, Keyu and others}, journal={arXiv preprint arXiv:2505.17589}, year={2025} } @article{cui2025glm, title={Glm-tts technical report}, author={Cui, Jiayan and Yang, Zhihan and Li, Naihan and Tian, Jiankun and Ma, Xingyu and Zhang, Yi and Chen, Guangyu and Yang, Runxuan and Cheng, Yuqing and Zhou, Yizhi and others}, journal={arXiv preprint arXiv:2512.14291}, year={2025} } @article{hu2026qwen3, title={Qwen3-TTS Technical Report}, author={Hu, Hangrui and Zhu, Xinfa and He, Ting and Guo, Dake and Zhang, Bin and Wang, Xiong and Guo, Zhifang and Jiang, Ziyue and Hao, Hongkun and Guo, Zishan and others}, journal={arXiv preprint arXiv:2601.15621}, year={2026} } @article{xie2025fireredtts, title={Fireredtts-2: Towards long conversational speech generation for podcast and chatbot}, author={Xie, Kun and Shen, Feiyu and Li, Junjie and Xie, Fenglong and Tang, Xu and Hu, Yao}, journal={arXiv preprint arXiv:2509.02020}, year={2025} } @article{liao2026fish, title={Fish Audio S2 Technical Report}, author={Liao, Shijia and Wang, Yuxuan and Liu, Songting and Cheng, Yifan and Zhang, Ruoyi and Li, Tianyu and Li, Shidong and Zheng, Yisheng and Liu, Xingwei and Wang, Qingzheng and others}, journal={arXiv preprint arXiv:2603.08823}, year={2026} } ```
提供机构:
Izzyzlin
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作