"InDeepFake: A Novel Multimodal Multilingual Indian Deepfake Video Dataset"

Name: "InDeepFake: A Novel Multimodal Multilingual Indian Deepfake Video Dataset"
Creator: IEEE DataPort
Published: 2025-08-06 10:57:54
License: 暂无描述

DataCite Commons2025-08-06 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/indeepfake-novel-multimodal-multilingual-indian-deepfake-video-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

"Recent advancements in Generative AI have resulted in decline of online digital contents credibility, at all levels of the human society. In spite of numerous discussions in popular media on the grave risks exposed by deepfakes and the relative lack of human awareness, deepfake based illegal activities are on the rise all over the world. India as a nation has seen rapid surge in deepfake cases reported in recent times, with news channels and media flooded with cases of financial fraudulence, personal vendetta, and false political propaganda, especially before the national and state elections. This can prove detrimental against the democratic future of the nation, indicating a serious need for efficient deepfake detectors in the coming days, tailored to investigate and solve Indian deepfake cases. The task is particularly challenging given the great linguistic and ethnic diversity of India. Based on this motivation, in our work, we develop an extensive deepfake dataset for the Indian population. To the best of our knowledge, this is the first such effort that is reported. We have developed a multimodal audio-video deepfake dataset, in seven major Indian languages, and seven state-of-the-art (SOTA) deepfake generators, covering a wide range of age and gender diversity. We evaluated SOTA detector results on the proposed dataset, to highlight its relevance in furthering multimodal deepfake research. We have open-sourced the dataset and code to implement the baseline methods at: https:\/\/github.com\/arnabdasphd\/InDeepFake."

近年来，生成式AI(Generative AI)的快速发展导致全球人类社会各层级的在线数字内容可信度持续下滑。尽管大众媒体已就深度伪造(deepfake)暴露的严峻风险以及公众相关认知不足展开大量讨论，但基于深度伪造的违法活动在全球范围内仍呈上升趋势。近期印度境内上报的深度伪造案件数量急剧攀升，新闻频道与各类媒体中充斥着金融诈骗、个人报复以及虚假政治宣传相关的深度伪造案例，尤以全国及邦级选举前夕为甚。这对印度的民主未来或将造成严重损害，凸显了未来亟需专为印度深度伪造案件研发的高效深度伪造检测工具的迫切需求。鉴于印度在语言与族群上的高度多样性，该检测任务尤具挑战性。基于上述动机，本研究构建了一套面向印度人群的大规模深度伪造数据集。据我们所知，这是目前已公开报道的首个此类数据集构建工作。我们构建了一套多模态音视频深度伪造数据集，涵盖印度七种主要语言，使用了七种当前最优(SOTA)深度伪造生成模型，且数据集覆盖了广泛的年龄与性别分布。我们在自建数据集上测试了多款当前最优深度伪造检测模型的性能，以凸显该数据集对推动多模态深度伪造研究的实际价值。我们已将该数据集与基线方法实现代码开源，相关资源可通过以下链接获取：https://github.com/arnabdasphd/InDeepFake。

提供机构：

IEEE DataPort

创建时间：

2025-08-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集