five

vstantch/x402-pii-corpus

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/vstantch/x402-pii-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en tags: - pii - privacy - x402 - payments - synthetic - named-entity-recognition pretty_name: x402 PII Metadata Corpus size_categories: - 1K<n<10K task_categories: - token-classification task_ids: - named-entity-recognition --- # x402 PII Metadata Corpus Synthetic labelled corpus of 2,000 x402 payment metadata triples for PII filter evaluation. Released alongside the paper **"Hardening x402: Privacy-Preserving Agentic Payments via Pre-Execution Metadata Filtering"**. - **Paper:** [arXiv:2604.11430](https://arxiv.org/abs/2604.11430) [cs.CR] - **Canonical archive:** [IEEE DataPort doi:10.21227/kpsz-nq73](https://doi.org/10.21227/kpsz-nq73) - **Code:** [presidio-v/presidio-hardened-x402](https://github.com/presidio-v/presidio-hardened-x402) ## Dataset description Each record represents one x402 payment metadata triple (`resource_url`, `description`, `reason`) drawn from seven API categories. 36% of samples contain at least one synthetic PII entity injected into one of the three fields, with ground-truth span labels. ### Fields | Field | Type | Description | |---|---|---| | `id` | string | Sample identifier (`syn-NNNNN`) | | `category` | string | API category (`media`, `medical`, `financial`, `data_access`, `generic`, `compute`, `ai_inference`) | | `resource_url` | string | Synthetic x402 resource URL | | `description` | string | Payment description field | | `reason` | string | Payment reason field | | `pii_positive` | bool | `true` if any PII entity is present | | `labels` | list | Ground-truth entity annotations (see below) | ### Label schema ```json { "entity_type": "EMAIL_ADDRESS", "field": "description", "start": 24, "end": 45, "value": "<synthetic value>" } ``` Entity types: `PERSON`, `EMAIL_ADDRESS`, `US_SSN`, `IBAN_CODE`, `CREDIT_CARD`, `PHONE_NUMBER`. ### Corpus statistics | Split | Samples | |---|---| | train (all) | 2,000 | | Entity type | Count | Share | |---|---|---| | PERSON | 321 | 36.7% | | EMAIL_ADDRESS | 313 | 35.8% | | IBAN_CODE | 96 | 11.0% | | US_SSN | 85 | 9.7% | | PHONE_NUMBER | 32 | 3.7% | | CREDIT_CARD | 28 | 3.2% | ## Usage ```python from datasets import load_dataset ds = load_dataset("vstantch/x402-pii-corpus") print(ds["train"][0]) ``` ## Citation ```bibtex @misc{stantchev2026hardeningx402, title = {Hardening x402: Privacy-Preserving Agentic Payments via Pre-Execution Metadata Filtering}, author = {Stantchev, Vladimir}, year = {2026}, eprint = {2604.11430}, archivePrefix = {arXiv}, primaryClass = {cs.CR}, } ``` Corpus data also archived at IEEE DataPort: ```bibtex @misc{stantchev2026dataset, author = {Stantchev, Vladimir}, title = {Hardening x402: PII Filter Corpus, Sweep Results, and Live Ecosystem Data}, year = {2026}, publisher = {IEEE DataPort}, doi = {10.21227/kpsz-nq73}, howpublished = {IEEE DataPort, \url{https://doi.org/10.21227/kpsz-nq73}}, } ```
提供机构:
vstantch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作