ShmalexFlow/whiteout-compliance-benchmark

Name: ShmalexFlow/whiteout-compliance-benchmark
Creator: ShmalexFlow
Published: 2026-04-17 00:40:22
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/ShmalexFlow/whiteout-compliance-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-classification language: - en tags: - compliance - ai-governance - enterprise-security - academic-integrity - data-loss-prevention - FERPA - HIPAA - GDPR - PII - PHI size_categories: - 10K<n<100K --- # Whiteout AI Compliance Benchmark A 15,915-prompt benchmark for evaluating AI compliance engines — systems that enforce content policies on user prompts before they reach AI providers. Built by [Groovy Security](https://groovysec.com) for the Whiteout AI platform. ## Dataset Summary | Property | Value | |----------|-------| | Total prompts | 15,915 | | Categories | 9 (PHI, PII, GDPR, Legal, Code, Confidential, Security, Finance, Education) | | Policies | 74 across all categories | | Prompt types | 3 (safe, violation, edge_case) | | Character length range | 16 — 12,000+ | | Language | English | ## Structure Each row contains: - `text`: the prompt to be evaluated - `category`: policy category (PHI, PII, GDPR, Legal, Code, Confidential, Security, Finance, Education) - `policy_id`: specific policy being tested (e.g., `block_ssn`, `detect_exam_cheating`) - `prompt_type`: one of `safe` (should pass), `violation` (should be blocked), `edge_case` (borderline, should pass) - `expected`: `pass` or `block` - `length_chars`: character count - `length_bucket`: size category (<100, 100-300, 300-800, 800-2K, 2K-5K, 5K-10K, 10K+) - `phase`: `phase1_short`, `phase2_long`, or `education` ## Phases - **Phase 1 (13,792 prompts)**: Short prompts (<2K chars) covering 8 enterprise data protection categories - **Phase 2 (1,116 prompts)**: Long-form prompts (1K-12K chars) — emails, memos, chat transcripts, reports, documents - **Education (1,007 prompts)**: Academic integrity (student perspective) + institutional data protection (faculty perspective) ## Benchmark Results Evaluated against Whiteout AI's semantic compliance engine (qwen3.5:27b, pure semantic, no regex): | Phase | Prompts | Accuracy | |-------|---------|----------| | Phase 1 (short) | 13,792 | 99.13% | | Phase 2 (long) | 1,116 | 99.91% | | Education | 1,007 | 99.21% | | **Combined** | **15,915** | **99.19%** | ## Usage ```python from datasets import load_dataset ds = load_dataset("ShmalexFlow/whiteout-compliance-benchmark") # Filter by category phi_prompts = ds["train"].filter(lambda x: x["category"] == "PHI") # Filter by type violations = ds["train"].filter(lambda x: x["prompt_type"] == "violation") # Filter by phase long_prompts = ds["train"].filter(lambda x: x["phase"] == "phase2_long") ``` ## Citation If you use this dataset, please cite: ```bibtex @dataset{whiteout_compliance_benchmark_2026, title={Whiteout AI Compliance Benchmark}, author={Groovy Security}, year={2026}, url={https://huggingface.co/datasets/ShmalexFlow/whiteout-compliance-benchmark}, note={15,915 prompts for evaluating AI compliance engines across enterprise and education domains} } ``` ## License Apache 2.0 ## Contact - Product: [Whiteout AI](https://groovysec.com/whiteout-ai) - Company: [Groovy Security](https://groovysec.com)

提供机构：

ShmalexFlow

5,000+

优质数据集

54 个

任务类型

进入经典数据集