five

PandhereAnu/telehealth-pii-dataset

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/PandhereAnu/telehealth-pii-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: tokens list: string - name: ner_tags list: class_label: names: '0': '0' '1': B-PATIENT '2': I-PATIENT '3': B-DOCTOR '4': I-DOCTOR '5': B-MRN '6': I-MRN '7': B-PHONE '8': I-PHONE '9': B-DATE '10': I-DATE splits: - name: train num_bytes: 483917 num_examples: 1600 - name: validation num_bytes: 60549 num_examples: 200 - name: test num_bytes: 60358 num_examples: 200 download_size: 607550 dataset_size: 604824 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Telehealth PII Dataset A synthetic dataset for training NER models to detect and redact HIPAA-sensitive PII from telehealth transcripts. ## Dataset Description Custom built dataset with 1600 labeled sentences covering real-world telehealth scenarios. Created because real patient data is protected under HIPAA and cannot be shared publicly. ## Dataset Structure | Split | Size | |------------|------| | Train | 1280 | | Validation | 160 | | Test | 160 | ## Features - `tokens` — list of words in each sentence - `ner_tags` — BIO labels for each token ## Label Classes | Label | Description | |------------|------------------------| | O | Not PII | | B/I-PATIENT| Patient name | | B/I-DOCTOR | Provider name | | B/I-MRN | Medical record number | | B/I-PHONE | Phone number | | B/I-DATE | Appointment/birth date | ## How to Use ```python from datasets import load_dataset dataset = load_dataset("PandhereAnu/telehealth-pii-dataset") print(dataset) ``` ## Scenarios Covered - Receptionist to patient calls - Doctor scheduling notes - Pharmacy and billing calls - Prescription refill reminders - Hospital discharge summaries - Emergency ward checkups - Insurance form calls - Nurse patient reminders ## Intended Use Training NER models for healthcare transcript de-identification and HIPAA compliance automation. ## Limitations - Synthetic data only - English language - Limited PII variety per template
提供机构:
PandhereAnu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作