mukunda1729/pii-detection-fixtures
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/mukunda1729/pii-detection-fixtures
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- pii
- privacy
- security
- llm
- testing
- red-team
size_categories:
- n<1K
configs:
- config_name: default
data_files:
- split: train
path: data.jsonl
---
# pii-detection-fixtures
25 short text snippets labeled for PII (Personally Identifiable Information) and secrets. Designed as a small, hand-curated fixture set for testing PII redaction pipelines, agent guardrails, and LLM prompt sanitizers.
All data is **synthetic** — no real people, real keys, or real accounts.
## PII / secret types covered
| Type | Examples in this set |
|---|---|
| `email` | 3 |
| `phone` | 2 |
| `ssn` | 1 |
| `dob` | 1 |
| `credit_card` | 1 |
| `address` | 1 |
| `name` | 2 |
| `medical_record` | 1 |
| `passport` | 1 |
| `coordinates` | 1 |
| `bank_routing`, `bank_account` | 1 each |
| `mac_address`, `ip_address` | 1 each |
| `api_key`, `aws_access_key`, `github_token`, `slack_token`, `stripe_key`, `jwt`, `credentials_url` | 1 each |
| **No PII (control)** | 5 |
## Schema
```jsonc
{
"id": "string",
"text": "string", // the input snippet
"has_pii": true, // boolean overall flag
"pii_types": ["email", "phone", ...], // unique types found
"spans": [
{"start": 0, "end": 17, "type": "email", "value": "alice@example.com"}
]
}
```
`spans` use **character offsets** into `text`.
## Quickstart
```python
from datasets import load_dataset
ds = load_dataset("mukunda1729/pii-detection-fixtures", split="train")
controls = [r for r in ds if not r["has_pii"]]
print(f"{len(controls)} negative examples (no PII)")
```
## Related
- [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/)
- [`agentguard` on PyPI](https://pypi.org/project/agentguard-firewall/) — egress firewall (blocks data exfil at the network layer)
## License
MIT.
提供机构:
mukunda1729



