five

NINJA: Multi-Source Data for JPEG Steganalysis

收藏
Mendeley Data2026-05-21 收录
下载链接:
https://data.mendeley.com/datasets/v6h45zb75c
下载链接
链接失效反馈
官方服务:
资源简介:
The NINJA dataset comprises JPEG images recorded from five real world sources: cameras, WhatsApp, social media (Instagram, X, and Pinterest), screenshots, and web downloads. This dataset maintains platform specific compression to provide a realistic environment. The research hypothesis is that current detection algorithms trained on academic controlled sets do not perform well on real world images which undergo many procedures such as resizing and recompression. This offers a realistic scenario to assess forensic algorithms against nonlinear distortions due to media sharing. The variety in sources enables researchers to eliminate cover source mismatch so steganalysis models can discern between steganographic interventions and regular platform processing. Providing raw JPEG images, Discrete Cosine Transform matrices (saved in .npz format), and processed statistical feature banks (saved in .npy format) enables the use in modeling approaches, from deep learning to classic statistical algorithms. Images were manually acquired to represent a variety of real world compression artifacts. Camera shots were used to establish a baseline of natural image statistics, whereas social media images were downloaded directly to capture the platform specific artifacts. The coefficients of Discrete Cosine Transform were modified using scripts written in MATLAB based language and the Phil Sallee JPEG Toolbox for steganographic versions. The cover images were processed at three different quality factors: 70, 80, and 90, to all of these process types with a constant payload of 0.4 bits per nonzero AC coefficients, by J-UNIWARD, JMiPOD and UERD algorithms. To process full color images a Decoupled Transplant Architecture was adopted. This includes implanting the payload in a single channel grayscale donor and replacing the host color channels containing the coefficients with the modified coefficients in the donor's luminance channel without changing the host's chrominance channel's coefficients. The data consists of raw images in JPEG format, Discrete Cosine Transform matrices in .npz format, and statistical feature banks in .npy format. The feature banks are the histogram extracted from the luminance (Y) channel of each image. This standardized structure ensures the reproducibility when testing new methods with realistic conditions. The Discrete Cosine Transform coefficients are used as the patterns modified by steganography, so the frequency domain analysis is used. In addition to steganalysis, the information may be utilized for diverse image forensics tasks like anomaly detection, compression artifact analysis or domain generalization experiments in unconstrained scenarios.
创建时间:
2026-05-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作