ShmalexFlow/whiteout-compliance-benchmark
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ShmalexFlow/whiteout-compliance-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-classification
language:
- en
tags:
- compliance
- ai-governance
- enterprise-security
- academic-integrity
- data-loss-prevention
- FERPA
- HIPAA
- GDPR
- PII
- PHI
size_categories:
- 10K<n<100K
---
# Whiteout AI Compliance Benchmark
A 15,915-prompt benchmark for evaluating AI compliance engines — systems that enforce content policies on user prompts before they reach AI providers.
Built by [Groovy Security](https://groovysec.com) for the Whiteout AI platform.
## Dataset Summary
| Property | Value |
|----------|-------|
| Total prompts | 15,915 |
| Categories | 9 (PHI, PII, GDPR, Legal, Code, Confidential, Security, Finance, Education) |
| Policies | 74 across all categories |
| Prompt types | 3 (safe, violation, edge_case) |
| Character length range | 16 — 12,000+ |
| Language | English |
## Structure
Each row contains:
- `text`: the prompt to be evaluated
- `category`: policy category (PHI, PII, GDPR, Legal, Code, Confidential, Security, Finance, Education)
- `policy_id`: specific policy being tested (e.g., `block_ssn`, `detect_exam_cheating`)
- `prompt_type`: one of `safe` (should pass), `violation` (should be blocked), `edge_case` (borderline, should pass)
- `expected`: `pass` or `block`
- `length_chars`: character count
- `length_bucket`: size category (<100, 100-300, 300-800, 800-2K, 2K-5K, 5K-10K, 10K+)
- `phase`: `phase1_short`, `phase2_long`, or `education`
## Phases
- **Phase 1 (13,792 prompts)**: Short prompts (<2K chars) covering 8 enterprise data protection categories
- **Phase 2 (1,116 prompts)**: Long-form prompts (1K-12K chars) — emails, memos, chat transcripts, reports, documents
- **Education (1,007 prompts)**: Academic integrity (student perspective) + institutional data protection (faculty perspective)
## Benchmark Results
Evaluated against Whiteout AI's semantic compliance engine (qwen3.5:27b, pure semantic, no regex):
| Phase | Prompts | Accuracy |
|-------|---------|----------|
| Phase 1 (short) | 13,792 | 99.13% |
| Phase 2 (long) | 1,116 | 99.91% |
| Education | 1,007 | 99.21% |
| **Combined** | **15,915** | **99.19%** |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("ShmalexFlow/whiteout-compliance-benchmark")
# Filter by category
phi_prompts = ds["train"].filter(lambda x: x["category"] == "PHI")
# Filter by type
violations = ds["train"].filter(lambda x: x["prompt_type"] == "violation")
# Filter by phase
long_prompts = ds["train"].filter(lambda x: x["phase"] == "phase2_long")
```
## Citation
If you use this dataset, please cite:
```bibtex
@dataset{whiteout_compliance_benchmark_2026,
title={Whiteout AI Compliance Benchmark},
author={Groovy Security},
year={2026},
url={https://huggingface.co/datasets/ShmalexFlow/whiteout-compliance-benchmark},
note={15,915 prompts for evaluating AI compliance engines across enterprise and education domains}
}
```
## License
Apache 2.0
## Contact
- Product: [Whiteout AI](https://groovysec.com/whiteout-ai)
- Company: [Groovy Security](https://groovysec.com)
提供机构:
ShmalexFlow



