HaoY0001/LlamaPartialSpoof
收藏Hugging Face2024-12-02 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/HaoY0001/LlamaPartialSpoof
下载链接
链接失效反馈官方服务:
资源简介:
LlamaPartialSpoof数据集v1.0.b是一个包含完全和部分伪造的语音数据集,设计初衷是从攻击者的角度出发,使用大型语言模型(LLM)和各种TTS模型生成。数据集分为两部分:R01TTS.0.a包含真实语音、完全伪造语音和通过交叉淡入淡出技术生成的部分伪造语音;R01TTS.0.b包含通过剪切/粘贴或重叠/添加技术生成的部分伪造语音。每个部分的标签信息存储在文本文件中,语音样本则存储在压缩文件中。标签文件的每一行格式为`<id> <utterance-duration> <utterance-label> <segment1> <segment2> ... <segmentN>`,每个段落的格式为`<start>-<end>-<label>`,标签为bonafide或spoof。数据集还包括一个元数据文件(metadata_crossfade.csv),记录了用于生成部分伪造样本的淡入淡出函数信息。
The LlamaPartialSpoof dataset is a fully and partially fake speech dataset designed from the attackers perspectives. The dataset is created using a novel generation pipeline that combines a large language model (LLM) and various TTS models. The dataset v1.0.b includes two parts: R01TTS.0.a contains bonafide, fully fake (TTS001--006), and partially fake (TTS001--006) using crossfade; R01TTS.0.b contains partially fake (TTS001--006) using cut/paste or overlap/add techniques. The labels for each part are stored in corresponding text files, while the speech samples are stored in archives. Each line in the label files is formatted as `<id> <utterance-duration> <utterance-label> <segment1> <segment2> ... <segmentN>`, with each segment formatted as `<start>-<end>-<label>`, where the label is either bonafide or spoof. The dataset also includes a metadata file (metadata_crossfade.csv) containing information about the fading function used to create partially fake samples (crossfade). The dataset is in English and is licensed under cc-by-4.0.
提供机构:
HaoY0001



