ashleyscruse/noble-ai-evidence-benchmark
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ashleyscruse/noble-ai-evidence-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-classification
tags:
- ai-detection
- synthetic-image-detection
- law-enforcement
- benchmark
- image-forensics
size_categories:
- 1K<n<10K
---
# NOBLE AI-Generated Evidence Detection Benchmark
A benchmark dataset for evaluating AI-generated image detection tools on law enforcement content (bodycam, surveillance, evidence photos).
## Dataset Description
Existing AI image detection tools are trained on social media content. They have never been tested on the grainy, compressed, low-quality images typical of law enforcement contexts. This dataset fills that gap.
### Sources
| Type | Count | Source |
|------|-------|--------|
| Real images | 422 | Open Images V7 (validation split) |
| Synthetic images | 495 | Generated with 3 diffusion models |
| **Total** | **917** | |
### Real Image Categories
| Category | Count |
|----------|-------|
| People | 93 |
| Vehicles | 81 |
| Indoor scenes | 81 |
| Outdoor scenes | 82 |
| Objects | 85 |
### Synthetic Image Generators
| Generator | Count | Model ID |
|-----------|-------|----------|
| Stable Diffusion 1.5 | 165 | runwayml/stable-diffusion-v1-5 |
| OpenJourney v4 | 165 | prompthero/openjourney-v4 |
| Realistic Vision 5.1 | 165 | SG161222/Realistic_Vision_V5.1_noVAE |
### Degradation Levels
Each image appears at 3 quality levels simulating real-world law enforcement conditions:
| Level | Parameters | Simulates |
|-------|-----------|-----------|
| Clean | None | High-quality digital photos |
| Moderate | JPEG Q50 + blur sigma=1 + contrast 0.8x | Decent surveillance footage |
| Heavy | JPEG Q30 + downscale 50% + noise sigma=25 + blur sigma=2 | Poor bodycam / old CCTV |
## Dataset Structure
```
raw/
real/{people,vehicles,indoor_scenes,outdoor_scenes,objects}/
synthetic/{surveillance_security,evidence_style,bodycam_style,documents}/
processed/
clean/{real,synthetic}/
moderate/{real,synthetic}/
heavy/{real,synthetic}/
results/
metrics/
figures/
```
## Preliminary Results
Evaluation of the HuggingFace AI image detector (umm-maybe/AI-image-detector) on this benchmark:
| Quality Level | Accuracy | F1 | AUC-ROC |
|---------------|----------|----|---------|
| Clean | 44.5% | 37.4% | 0.402 |
| Moderate | 45.1% | 17.1% | 0.381 |
| Heavy | 34.5% | 18.2% | 0.265 |
The detector performs **worse than random chance** on law enforcement content, with accuracy degrading further as image quality decreases.
## Uses
- Benchmarking AI-generated image detection tools on domain-specific content
- Studying the effect of image degradation on detection accuracy
- Training improved detection models for law enforcement contexts
## Citation
If you use this dataset, please cite:
```
@misc{scruse2026noble,
title={NOBLE AI-Generated Evidence Detection Benchmark},
author={Scruse, Ashley and Gosha, Kinnis},
year={2026},
publisher={HuggingFace},
}
```
## Funding
This work is funded by NOBLE (National Organization of Black Law Enforcement Executives), $25,000 research grant.
## Contact
**PI:** Dr. Ashley Scruse, Morehouse College
提供机构:
ashleyscruse



