jiamingzz/geo_6k

Name: jiamingzz/geo_6k
Creator: jiamingzz
Published: 2025-12-11 05:50:08
License: 暂无描述

Hugging Face2025-12-11 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/jiamingzz/geo_6k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - image-classification - visual-question-answering language: - en tags: - privacy - adversarial-attack - geographic-reasoning - multimodal size_categories: - 1K<n<10K --- # GeoPrivacy-6K **[Project Page](https://jiamingzz94.github.io/reasonbreak/)** | **[Paper](https://arxiv.org/abs/2512.08503)** | **[Code](https://github.com/jiamingzhang94/ReasonBreak)** ## Introduction **GeoPrivacy-6K** is a specialized dataset comprising **6,341 ultra-high-resolution images** ($\ge$ 2K resolution) designed to study and defend against reasoning-based privacy threats. It was introduced in the paper **"Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models"** to train the **ReasonBreak** adversarial framework. ### Data Sources The images are carefully curated and filtered from three established high-quality vision datasets to ensure diverse coverage of urban, natural, and aesthetic scenes: * **HoliCity**: Urban environments with rich architectural details. * **Aesthetic-4K**: Diverse high-quality scenes with strong composition. * **LHQ (Landscapes HQ)**: Ultra-high-resolution natural landscapes. ### Privacy Note This dataset focuses on *geographic* privacy. **It does not contain personally identifiable information (PII).** Sensitive elements such as human faces or license plates have been processed or anonymized in the original source datasets. ## Dataset Highlights * **Hierarchical Conceptual Annotations**: Unlike traditional geolocation datasets that rely on GPS coordinates, GeoPrivacy-6K provides **multi-level visual concept annotations** (e.g., *"Gothic architecture"*, *"Deciduous forest"*) spanning Continental, National, City, and Local levels. This enables models to learn the *reasoning logic* behind location inference rather than just memorizing coordinates. * **Fine-Grained Details**: All images maintain ultra-high resolution to preserve subtle cues (signage, vegetation patterns, architectural styles) that modern Multimodal Large Reasoning Models (MLRMs) exploit. ## Dataset Structure * **Images**: Located in the root directory (zipped). * **Annotations**: `location_analysis_fixed.jsonl` contains the reasoning chains, hierarchical concepts, and spatial bounding boxes for each image. ## Usage This dataset is primarily designed for training the **ReasonBreak** generator. Please refer to the [GitHub Repository](https://github.com/jiamingzhang94/ReasonBreak) for: * Data loading scripts. * Training instructions. ## Citation If you use this dataset, please cite our paper: ```bibtex @article{zhang2025reasonbreak, title={Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models}, author={Zhang, Jiaming and Wang, Che and Cao, Yang and Huang, Longtao and Lim, Wei Yang Bryan}, journal={arXiv preprint arXiv:2512.08503}, year={2025} }

提供机构：

jiamingzz

5,000+

优质数据集

54 个

任务类型

进入经典数据集