HaoY0001/LlamaPartialSpoof

Name: HaoY0001/LlamaPartialSpoof
Creator: HaoY0001
Published: 2024-12-02 05:01:54
License: 暂无描述

Hugging Face2024-12-02 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/HaoY0001/LlamaPartialSpoof

下载链接

链接失效反馈

官方服务：

资源简介：

LlamaPartialSpoof数据集v1.0.b是一个包含完全和部分伪造的语音数据集，设计初衷是从攻击者的角度出发，使用大型语言模型（LLM）和各种TTS模型生成。数据集分为两部分：R01TTS.0.a包含真实语音、完全伪造语音和通过交叉淡入淡出技术生成的部分伪造语音；R01TTS.0.b包含通过剪切/粘贴或重叠/添加技术生成的部分伪造语音。每个部分的标签信息存储在文本文件中，语音样本则存储在压缩文件中。标签文件的每一行格式为`<id> <utterance-duration> <utterance-label> <segment1> <segment2> ... <segmentN>`，每个段落的格式为`<start>-<end>-<label>`，标签为bonafide或spoof。数据集还包括一个元数据文件（metadata_crossfade.csv），记录了用于生成部分伪造样本的淡入淡出函数信息。

The LlamaPartialSpoof dataset is a fully and partially fake speech dataset designed from the attackers perspectives. The dataset is created using a novel generation pipeline that combines a large language model (LLM) and various TTS models. The dataset v1.0.b includes two parts: R01TTS.0.a contains bonafide, fully fake (TTS001--006), and partially fake (TTS001--006) using crossfade; R01TTS.0.b contains partially fake (TTS001--006) using cut/paste or overlap/add techniques. The labels for each part are stored in corresponding text files, while the speech samples are stored in archives. Each line in the label files is formatted as `<id> <utterance-duration> <utterance-label> <segment1> <segment2> ... <segmentN>`, with each segment formatted as `<start>-<end>-<label>`, where the label is either bonafide or spoof. The dataset also includes a metadata file (metadata_crossfade.csv) containing information about the fading function used to create partially fake samples (crossfade). The dataset is in English and is licensed under cc-by-4.0.

提供机构：

HaoY0001

5,000+

优质数据集

54 个

任务类型

进入经典数据集