five

mukunda1729/pii-detection-fixtures

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/mukunda1729/pii-detection-fixtures
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en tags: - pii - privacy - security - llm - testing - red-team size_categories: - n<1K configs: - config_name: default data_files: - split: train path: data.jsonl --- # pii-detection-fixtures 25 short text snippets labeled for PII (Personally Identifiable Information) and secrets. Designed as a small, hand-curated fixture set for testing PII redaction pipelines, agent guardrails, and LLM prompt sanitizers. All data is **synthetic** — no real people, real keys, or real accounts. ## PII / secret types covered | Type | Examples in this set | |---|---| | `email` | 3 | | `phone` | 2 | | `ssn` | 1 | | `dob` | 1 | | `credit_card` | 1 | | `address` | 1 | | `name` | 2 | | `medical_record` | 1 | | `passport` | 1 | | `coordinates` | 1 | | `bank_routing`, `bank_account` | 1 each | | `mac_address`, `ip_address` | 1 each | | `api_key`, `aws_access_key`, `github_token`, `slack_token`, `stripe_key`, `jwt`, `credentials_url` | 1 each | | **No PII (control)** | 5 | ## Schema ```jsonc { "id": "string", "text": "string", // the input snippet "has_pii": true, // boolean overall flag "pii_types": ["email", "phone", ...], // unique types found "spans": [ {"start": 0, "end": 17, "type": "email", "value": "alice@example.com"} ] } ``` `spans` use **character offsets** into `text`. ## Quickstart ```python from datasets import load_dataset ds = load_dataset("mukunda1729/pii-detection-fixtures", split="train") controls = [r for r in ds if not r["has_pii"]] print(f"{len(controls)} negative examples (no PII)") ``` ## Related - [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/) - [`agentguard` on PyPI](https://pypi.org/project/agentguard-firewall/) — egress firewall (blocks data exfil at the network layer) ## License MIT.
提供机构:
mukunda1729
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作