DFRWS EU '23: Hamming Distributions of Popular Perceptual Hashing Techniques - DATASET
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7426034
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Purpose and Citation
This repository contains raw data and plots for the experimental work in the paper:
McKeown, S., Buchanan, WJ. (2023 - In Press). Hamming Distributions of Popular Perceptual Hashing Techniques. DFRWS EU 2023, Bonn, Germany.
Six perceptual hashes are evaluated on the Flickr 1 Million dataset against a variety of content-preserving attacks in order to better understand their overall behaviour.
Approach
Hashes for each algorithm and modification (listed below) are created and compared using the Normalised Hamming distance. Three main comparisons are done:
Inter-score originals (original unrelated images in Flickr 1 Million)
Inter-score modified (modified unrelated versions of the Flickr images compared to each other within class, e.g. cropped to cropped)
Intra-score (original to modified version of the same image)
Repository Contents
The top-level of the repository contains the SHA256 hashes of the Flickr 1 Million dataset used for exact file deduplication.
There are then six sub-folders, one for each perceptual hashing algorithm, containing:
Zip files of the raw hashes generated by the algorithm for the original Flickr 1 Million dataset
A separate Inter and Intra score sub-directory containing:
-- Zipped CSV files containing the Normalised Hamming distance for comparison files
-- An analysis folder containing plotted histograms and derived statistical information (e.g. percentiles, mean)
The analysis folders were largely used to generate the Figures and Tables in the paper, however more of them are present here than was possible to include in the conference paper.
It should be noted that 50 million comparison samples are taken for the inter-score original comparisons (out of a possible 500 billion), while only 250k modifications were generated, resulting in a smaller pool of comparisons for the modified inter-scores and original to modified intra-scores.
Tested Algorithms
Blockhash - Commons Machinery
ColourHash - from the Python ImageHash Libary
NeuralHash - Apple's models extracted using A. Ygvar's method
PDQ - from Facebook's ThreatExchange
Phash - from the Python ImageHash Libary
Wavehash - from the Python ImageHash Libary
Tested Modifications/Attacks
Border (30 pixel, black)
Compression (quality level 30 JPEG)
Crop (5% around all edges)
Mirror (x-axis)
Scaling (1.5x)
Thumbnails (Windows 96 pixel, see McKeown et al. 2019)
Watermarking (Image added to bottom-right corner, 10% of image height or minimum of 40 pixels)
创建时间:
2022-12-13



