ArEnAV

Name: ArEnAV
Creator: MBZUAI, Monash University
Published: 2025-05-29 00:54:36
License: 暂无描述

arXiv2025-05-29 更新2025-11-28 收录

下载链接：

https://huggingface.co/datasets/kartik060702/ArEnAV-Full

下载链接

链接失效反馈

官方服务：

资源简介：

ArEnAV是一个大规模的阿拉伯语-英语音频-视觉深度伪造数据集，专注于句子内部的代码转换和方言变化，包括现代标准阿拉伯语、埃及、黎凡特和海湾方言的双语和双言转换，填补了多语言深度伪造研究中的关键空白。数据集包含约765小时的视频数据，来源于8809个独特的YouTube视频，建立了第一个最广泛的多语言深度伪造检测基准。数据集是通过系统地操纵语义内容来引入阿拉伯语-英语代码转换，同时保留了原始视频的身份和环境背景。数据集采用三种操作策略：伪造音频和视频、伪造音频和真实视频、真实音频和伪造视频。数据集由三个阶段组成：转录修改、音频生成和视频生成。转录修改阶段利用GPT-4.1-mini进行内容驱动修改，定义了八种不同的转录更改模式，包括代码转换和仅阿拉伯语环境，允许对转录的修改进行细粒度控制。音频生成阶段合成新的音频，同时保留说话者的声音特征。视频生成阶段渲染与新的音频匹配的唇形同步视频，产生逼真的操纵视频剪辑。ArEnAV是第一个也是最大的阿拉伯语-英语代码转换音频-视觉深度伪造数据集，为多语言深度伪造检测提供了一个强大的基准。该数据集的引入有望推动深度伪造研究的发展，并为创建更相关的检测系统做出贡献。

ArEnAV is a large-scale Arabic-English audio-visual deepfake dataset focusing on intra-sentential code-switching and dialectal variation, covering bilingual and diglossic code-switching between Modern Standard Arabic, Egyptian, Levantine, and Gulf Arabic dialects, which fills a critical gap in multilingual deepfake research. The dataset contains approximately 765 hours of video data sourced from 8,809 unique YouTube videos, establishing the first and most comprehensive benchmark for multilingual deepfake detection. The dataset introduces Arabic-English code-switching via systematic manipulation of semantic content, while preserving the original video's identity and contextual background. It employs three manipulation strategies: fake audio and fake video, fake audio and real video, and real audio and fake video. The dataset construction consists of three stages: transcription modification, audio generation, and video generation. The transcription modification stage utilizes GPT-4.1-mini for content-driven modification, defining eight distinct transcription modification patterns including code-switching and Arabic-only scenarios, enabling fine-grained control over transcription modifications. The audio generation stage synthesizes new audio while preserving the speaker's vocal characteristics. The video generation stage renders lip-synced videos matched to the new audio, producing realistic manipulated video clips. ArEnAV is the first and largest Arabic-English code-switching audio-visual deepfake dataset, providing a robust benchmark for multilingual deepfake detection. The introduction of this dataset is expected to advance deepfake research and contribute to the development of more relevant detection systems.

提供机构：

MBZUAI, Monash University

创建时间：

2025-05-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集