SAMPLE Synthetic Healthcare Encounters (Worldwide) — Zero-PII, ML-Ready CSV/Parquet for ...

Name: SAMPLE Synthetic Healthcare Encounters (Worldwide) — Zero-PII, ML-Ready CSV/Parquet for ...
Creator: Zalingo AI
License: 暂无描述

Databricks2025-09-16 收录

下载链接：

https://marketplace.databricks.com/details/6723e3dd-a2c7-457c-ac43-4dd1bb4bf2f5/Zalingo-AI_SAMPLE-Synthetic-Healthcare-Encounters-(Worldwide)-—-Zero-PII,-ML-Ready-CSV/Parquet-for-

下载链接

链接失效反馈

官方服务：

资源简介：

## What this is A privacy-safe **synthetic healthcare encounters** dataset with realistic patient-level visits and costs, **zero PII**, and modeling labels. It’s statistically calibrated to common provider/claims patterns so you can unblock **readmission, cost, and population-health POCs** immediately. ## What you get - Encounters with diagnosis, procedure, medication, clinician/facility IDs, and visit_date - Targets/labels: readmission_30d, risk_score - Formats: CSV (sample), Parquet/CSV for full deliveries - Documentation: data dictionary (included) - Delivery: secure S3 link or private share (Snowflake/Databricks/BigQuery) on request ## Why synthetic? - **Zero PII** and GDPR-friendly experimentation - Start modeling **today** while production approvals are pending - Stable, debuggable distributions for benchmarking and feature iteration ## Known limits - Not a drop-in replacement for your production EHR/claims - Aggregate patterns approximate real-world behavior; individual rows are synthetic ## Sample (2,000 rows) We include a **2,000-row CSV** for structure and quick tests. Example stats: - Coverage: 2024-09-06 → 2025-09-05 (≈12 months) - Age: mean 53.6; median 54.0 (range 18–89) - Cost: mean $134.22; median $105.61; p90 $251.96 - 30-day readmission: 41.2% - Risk score: 0.524 ± 0.188 - Top diagnoses: Diabetes, Depression, Asthma ## Support & customizations Need custom cohorts (specialty, region, code sets), schema extensions, or volume upgrades? We can generate tailored slices and refresh schedules.

提供机构：

Zalingo AI

5,000+

优质数据集

54 个

任务类型

进入经典数据集