five

VISH-DARIJA-TTS: A Synthetic Moroccan Darija Text-Audio Dataset for Vishing and Voice-Based Social Engineering Detection

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20039125
下载链接
链接失效反馈
官方服务:
资源简介:
VISH-DARIJA-TTS is a synthetic Moroccan Darija text-audio dataset designed for research on vishing, voice phishing, and voice-based social engineering detection. The dataset contains 3,400 multi-turn scenarios, including 1,700 scam and 1,700 non-scam dialogues, represented in Latin-script Moroccan Darija and Arabic-script Darija. It includes 3,400 normalized clean TTS-generated WAV files and 10,200 controlled noisy variants at 20 dB, 10 dB, and 5 dB SNR. The release also includes scenario-level and turn-level metadata, emotion normalization, social-engineering taxonomies, audio manifests, train/validation/test splits, validation reports, checksums, and documentation. The dataset is synthetic and does not contain real victim calls or human-recorded scam calls. Clean text files are aligned with the audio files, while public text files provide a sanitized version with sensitive operational details replaced by placeholders.
提供机构:
Zenodo
创建时间:
2026-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作