zachz/pii-detection-corpus
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zachz/pii-detection-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- token-classification
language:
- en
tags:
- pii
- privacy
- ner
- data-protection
- anonymization
pretty_name: PII Detection Corpus
size_categories:
- n<1K
---
# PII Detection Corpus
Synthetic dataset of text samples containing labeled PII (Personally Identifiable Information) for testing and benchmarking PII detection/scrubbing tools.
## Fields
- `text`: Text sample containing PII
- `pii_type`: Category of PII (email, phone, ssn, credit_card, ip, dob, address, passport, api_key, name, iban)
- `pii_value`: The exact PII string in the text
- `start`: Character offset start
- `end`: Character offset end
- `context`: Surrounding context category (medical, financial, support, legal, hr)
All data is **fully synthetic** — no real personal information.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("zachz/pii-detection-corpus")
```
## License
MIT
提供机构:
zachz



