Direction-aware Cross-modal Transformer for Image Tampering Localization

Name: Direction-aware Cross-modal Transformer for Image Tampering Localization
Creator: Science Data Bank
Published: 2026-03-16 10:51:03
License: 暂无描述

DataCite Commons2026-03-16 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=9c124721b52f437e832497a386772c24

下载链接

链接失效反馈

官方服务：

资源简介：

With the rapid development of image editing software and generative models, image manipulation has become increasingly accessible, posing serious challenges to the credibility of visual information. Image tampering localization, which aims to accurately identify and segment manipulated regions within an image, plays a critical role in digital forensics, public security, and media authentication. Despite significant progress achieved by deep learning–based approaches, existing methods still face notable limitations when dealing with complex manipulation scenarios. In particular, most approaches inadequately exploit directional structural cues of tampered regions and often suffer from insufficient modeling of cross-modal feature consistency, resulting in degraded localization accuracy and robustness under diverse post-processing operations. To address these issues, this paper proposes a direction-aware cross-modal reasoning framework for image tampering localization. The proposed method leverages complementary information from the RGB image domain and the noise-related feature domain to construct a unified cross-modal representation. Unlike conventional fusion strategies that simply concatenate or sum multi-modal features, our framework explicitly incorporates directional awareness to model structural consistency of tampered regions across different orientations. This design enables the model to better capture boundary characteristics and fine-grained geometric patterns that are often weakened or distorted by post-processing operations.Specifically, the proposed approach consists of three key components. First, a dual-branch feature extraction architecture is adopted to separately learn semantic information from the RGB domain and forensic cues from the noise domain. This design allows the model to preserve both high-level semantic context and low-level manipulation traces. Second, a direction-aware mechanism is introduced to encode orientation-sensitive information, guiding the network to emphasize directional consistency within tampered regions while suppressing irrelevant background responses. By explicitly modeling directional dependencies, the proposed method enhances the discriminative representation of tampered boundaries and internal structures. Third, a cross-modal reasoning module is developed to facilitate adaptive interaction between RGB and noise features. This module enables mutual guidance and information refinement across modalities, thereby improving feature complementarity and reducing redundancy during the localization process. Extensive experiments are conducted on multiple publicly available image tampering localization benchmarks to evaluate the effectiveness of the proposed method. Quantitative results demonstrate that the proposed approach consistently outperforms several state-of-the-art methods in terms of commonly used evaluation metrics, including F1-score and Intersection-over-Union (IoU). In particular, the proposed method shows notable performance gains in challenging scenarios involving complex tampering types and multiple post-processing operations, such as compression, blurring, and rescaling. Qualitative comparisons further illustrate that the proposed framework produces more accurate and coherent localization results, especially along tampered boundaries and fine structural details. In addition, robustness experiments indicate that the proposed direction-aware cross-modal reasoning framework maintains stable performance under various degradations commonly encountered in real-world image dissemination environments. This robustness can be attributed to the effective integration of directional structural cues and complementary cross-modal information, which jointly enhance the model’s ability to distinguish manipulated regions from authentic content. In conclusion, this work presents a novel direction-aware cross-modal reasoning approach for image tampering localization. By explicitly modeling directional consistency and leveraging cross-modal feature interactions, the proposed method achieves improved localization accuracy and robustness in complex manipulation scenarios. The experimental results demonstrate that the proposed framework provides a reliable and effective solution for image tampering localization, offering promising potential for practical applications in digital image forensics and multimedia security.

随着图像编辑软件与生成式模型的飞速发展，图像篡改操作愈发易于实现，对视觉信息的可信度造成了严峻挑战。图像篡改定位（image tampering localization）旨在精准识别并分割图像中的篡改区域，在数字取证、公共安全与媒体认证领域发挥着关键作用。尽管基于深度学习的方法已取得显著进展，但现有方法在处理复杂篡改场景时仍存在明显局限：多数方法未能充分挖掘篡改区域的方向结构线索，且往往缺乏对跨模态特征一致性的有效建模，导致在各类后处理操作下，定位精度与鲁棒性均出现下降。为解决上述问题，本文提出一种面向图像篡改定位的方向感知跨模态推理框架（direction-aware cross-modal reasoning framework）。所提方法利用RGB图像域与噪声相关特征域的互补信息，构建统一的跨模态表征。与简单拼接或求和多模态特征的传统融合策略不同，本框架显式融入方向感知机制，以建模不同方向下篡改区域的结构一致性。该设计使模型能够更好地捕捉常被后处理操作削弱或扭曲的边界特征与细粒度几何模式。具体而言，所提方法包含三大核心组件：其一，采用双分支特征提取架构（dual-branch feature extraction architecture），分别从RGB域学习语义信息、从噪声域提取取证线索（forensic cues）。该设计使模型既能保留高层语义上下文，又能留存低层篡改痕迹。其二，引入方向感知机制以编码方向敏感信息，引导网络强化篡改区域内的方向一致性，同时抑制无关背景响应。通过显式建模方向依赖关系，所提方法增强了篡改边界与内部结构的判别性表征。其三，设计跨模态推理模块以实现RGB特征与噪声特征的自适应交互。该模块支持跨模态的双向引导与信息细化，从而提升特征互补性并降低定位过程中的冗余度。本文在多个公开可用的图像篡改定位基准数据集上开展了大量实验，以验证所提方法的有效性。定量实验结果表明，在F1分数（F1-score）与交并比（Intersection-over-Union, IoU）等常用评估指标上，所提方法始终优于多款当前最优（state-of-the-art）方法。尤其在涉及复杂篡改类型与压缩、模糊、重缩放等多种后处理操作的挑战性场景中，所提方法展现出显著的性能提升。定性对比进一步显示，本框架能够生成更为精准且连贯的定位结果，尤其在篡改边界与精细结构细节处表现突出。此外，鲁棒性实验表明，所提方向感知跨模态推理框架在真实世界图像传播环境中常见的各类退化场景下，仍能保持稳定的性能。该鲁棒性可归因于方向结构线索与互补跨模态信息的有效融合，二者共同提升了模型区分篡改区域与真实内容的能力。综上，本文提出了一种新颖的面向图像篡改定位的方向感知跨模态推理方法。通过显式建模方向一致性并利用跨模态特征交互，所提方法在复杂篡改场景中实现了定位精度与鲁棒性的提升。实验结果证明，本框架为图像篡改定位提供了可靠且有效的解决方案，在数字图像取证与多媒体安全的实际应用中展现出良好的应用前景。

提供机构：

Science Data Bank

创建时间：

2026-03-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集