five

DFRWS EU '23: Hamming Distributions of Popular Perceptual Hashing Techniques - DATASET

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7426034
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Purpose and Citation This repository contains raw data and plots for the experimental work in the paper: McKeown, S., Buchanan, WJ. (2023 - In Press). Hamming Distributions of Popular Perceptual Hashing Techniques. DFRWS EU 2023, Bonn, Germany. Six perceptual hashes are evaluated on the Flickr 1 Million dataset against a variety of content-preserving attacks in order to better understand their overall behaviour. Approach Hashes for each algorithm and modification (listed below) are created and compared using the Normalised Hamming distance. Three main comparisons are done: Inter-score originals (original unrelated images in Flickr 1 Million) Inter-score modified (modified unrelated versions of the Flickr images compared to each other within class, e.g. cropped to cropped) Intra-score (original to modified version of the same image) Repository Contents The top-level of the repository contains the SHA256 hashes of the Flickr 1 Million dataset used for exact file deduplication. There are then six sub-folders, one for each perceptual hashing algorithm, containing: Zip files of the raw hashes generated by the algorithm for the original Flickr 1 Million dataset A separate Inter and Intra score sub-directory containing: -- Zipped CSV files containing the Normalised Hamming distance for comparison files -- An analysis folder containing plotted histograms and derived statistical information (e.g. percentiles, mean) The analysis folders were largely used to generate the Figures and Tables in the paper, however more of them are present here than was possible to include in the conference paper. It should be noted that 50 million comparison samples are taken for the inter-score original comparisons (out of a possible 500 billion), while only 250k modifications were generated, resulting in a smaller pool of comparisons for the modified inter-scores and original to modified intra-scores. Tested Algorithms Blockhash - Commons Machinery ColourHash - from the Python ImageHash Libary NeuralHash - Apple's models extracted using A. Ygvar's method PDQ - from Facebook's ThreatExchange Phash - from the Python ImageHash Libary Wavehash - from the Python ImageHash Libary Tested Modifications/Attacks Border (30 pixel, black) Compression (quality level 30 JPEG) Crop (5% around all edges) Mirror (x-axis) Scaling (1.5x) Thumbnails (Windows 96 pixel, see McKeown et al. 2019) Watermarking (Image added to bottom-right corner, 10% of image height or minimum of 40 pixels)
创建时间:
2022-12-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作