five

ShmalexFlow/whiteout-compliance-benchmark

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ShmalexFlow/whiteout-compliance-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-classification language: - en tags: - compliance - ai-governance - enterprise-security - academic-integrity - data-loss-prevention - FERPA - HIPAA - GDPR - PII - PHI size_categories: - 10K<n<100K --- # Whiteout AI Compliance Benchmark A 15,915-prompt benchmark for evaluating AI compliance engines — systems that enforce content policies on user prompts before they reach AI providers. Built by [Groovy Security](https://groovysec.com) for the Whiteout AI platform. ## Dataset Summary | Property | Value | |----------|-------| | Total prompts | 15,915 | | Categories | 9 (PHI, PII, GDPR, Legal, Code, Confidential, Security, Finance, Education) | | Policies | 74 across all categories | | Prompt types | 3 (safe, violation, edge_case) | | Character length range | 16 — 12,000+ | | Language | English | ## Structure Each row contains: - `text`: the prompt to be evaluated - `category`: policy category (PHI, PII, GDPR, Legal, Code, Confidential, Security, Finance, Education) - `policy_id`: specific policy being tested (e.g., `block_ssn`, `detect_exam_cheating`) - `prompt_type`: one of `safe` (should pass), `violation` (should be blocked), `edge_case` (borderline, should pass) - `expected`: `pass` or `block` - `length_chars`: character count - `length_bucket`: size category (<100, 100-300, 300-800, 800-2K, 2K-5K, 5K-10K, 10K+) - `phase`: `phase1_short`, `phase2_long`, or `education` ## Phases - **Phase 1 (13,792 prompts)**: Short prompts (<2K chars) covering 8 enterprise data protection categories - **Phase 2 (1,116 prompts)**: Long-form prompts (1K-12K chars) — emails, memos, chat transcripts, reports, documents - **Education (1,007 prompts)**: Academic integrity (student perspective) + institutional data protection (faculty perspective) ## Benchmark Results Evaluated against Whiteout AI's semantic compliance engine (qwen3.5:27b, pure semantic, no regex): | Phase | Prompts | Accuracy | |-------|---------|----------| | Phase 1 (short) | 13,792 | 99.13% | | Phase 2 (long) | 1,116 | 99.91% | | Education | 1,007 | 99.21% | | **Combined** | **15,915** | **99.19%** | ## Usage ```python from datasets import load_dataset ds = load_dataset("ShmalexFlow/whiteout-compliance-benchmark") # Filter by category phi_prompts = ds["train"].filter(lambda x: x["category"] == "PHI") # Filter by type violations = ds["train"].filter(lambda x: x["prompt_type"] == "violation") # Filter by phase long_prompts = ds["train"].filter(lambda x: x["phase"] == "phase2_long") ``` ## Citation If you use this dataset, please cite: ```bibtex @dataset{whiteout_compliance_benchmark_2026, title={Whiteout AI Compliance Benchmark}, author={Groovy Security}, year={2026}, url={https://huggingface.co/datasets/ShmalexFlow/whiteout-compliance-benchmark}, note={15,915 prompts for evaluating AI compliance engines across enterprise and education domains} } ``` ## License Apache 2.0 ## Contact - Product: [Whiteout AI](https://groovysec.com/whiteout-ai) - Company: [Groovy Security](https://groovysec.com)
提供机构:
ShmalexFlow
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作