amphion/spmis
收藏Hugging Face2024-11-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/amphion/spmis
下载链接
链接失效反馈官方服务:
资源简介:
SpMis数据集旨在促进合成语音误导信息检测的研究。该数据集包含360,611个音频样本,涵盖五个主要主题:政治、医学、教育、法律和金融,其中8,681个样本被标记为误导信息。数据集使用先进的TTS模型生成,如Amphion和OpenVoice v2,并标注了语音是否为误导信息。数据集的结构包括音频文件、标签和元数据,并分为不同主题。数据集的创建动机是为了提供资源以训练检测合成语音误导信息的模型,特别是在深度伪造技术日益成为威胁的时代。
The SpMis Dataset is designed to facilitate research on detecting synthetic spoken misinformation. It includes 360,611 audio samples synthesized from over 1,000 speakers across five major topics: Politics, Medicine, Education, Laws, and Finance, with 8,681 samples labeled as misinformation. The audio clips are generated using state-of-the-art TTS models like Amphion and OpenVoice v2, and are labeled to indicate whether the speech is genuine or misinformation. The dataset is designed to assist in the development of models capable of detecting both synthetic speech and misinformation. The dataset includes audio files, labels, and metadata such as speaker identity, topic, duration, and language. The dataset is divided into specific topics with varying numbers of samples and misinformation labels. The primary purpose of this dataset is to train and evaluate models for misinformation detection, specifically identifying whether the spoken voice is intended to mislead. The dataset was created to address the growing threat of synthetic spoken misinformation in the era of deepfake technologies.
提供机构:
amphion



