five

zachz/pii-detection-corpus

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zachz/pii-detection-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - token-classification language: - en tags: - pii - privacy - ner - data-protection - anonymization pretty_name: PII Detection Corpus size_categories: - n<1K --- # PII Detection Corpus Synthetic dataset of text samples containing labeled PII (Personally Identifiable Information) for testing and benchmarking PII detection/scrubbing tools. ## Fields - `text`: Text sample containing PII - `pii_type`: Category of PII (email, phone, ssn, credit_card, ip, dob, address, passport, api_key, name, iban) - `pii_value`: The exact PII string in the text - `start`: Character offset start - `end`: Character offset end - `context`: Surrounding context category (medical, financial, support, legal, hr) All data is **fully synthetic** — no real personal information. ## Usage ```python from datasets import load_dataset ds = load_dataset("zachz/pii-detection-corpus") ``` ## License MIT
提供机构:
zachz
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作