SAMPLE Synthetic Healthcare Encounters (Worldwide) — Zero-PII, ML-Ready CSV/Parquet for ...
收藏Databricks2025-09-16 收录
下载链接:
https://marketplace.databricks.com/details/6723e3dd-a2c7-457c-ac43-4dd1bb4bf2f5/Zalingo-AI_SAMPLE-Synthetic-Healthcare-Encounters-(Worldwide)-—-Zero-PII,-ML-Ready-CSV/Parquet-for-
下载链接
链接失效反馈官方服务:
资源简介:
## What this is
A privacy-safe **synthetic healthcare encounters** dataset with realistic patient-level visits and costs, **zero PII**, and modeling labels. It’s statistically calibrated to common provider/claims patterns so you can unblock **readmission, cost, and population-health POCs** immediately.
## What you get
- Encounters with diagnosis, procedure, medication, clinician/facility IDs, and visit_date
- Targets/labels: readmission_30d, risk_score
- Formats: CSV (sample), Parquet/CSV for full deliveries
- Documentation: data dictionary (included)
- Delivery: secure S3 link or private share (Snowflake/Databricks/BigQuery) on request
## Why synthetic?
- **Zero PII** and GDPR-friendly experimentation
- Start modeling **today** while production approvals are pending
- Stable, debuggable distributions for benchmarking and feature iteration
## Known limits
- Not a drop-in replacement for your production EHR/claims
- Aggregate patterns approximate real-world behavior; individual rows are synthetic
## Sample (2,000 rows)
We include a **2,000-row CSV** for structure and quick tests. Example stats:
- Coverage: 2024-09-06 → 2025-09-05 (≈12 months)
- Age: mean 53.6; median 54.0 (range 18–89)
- Cost: mean $134.22; median $105.61; p90 $251.96
- 30-day readmission: 41.2%
- Risk score: 0.524 ± 0.188
- Top diagnoses: Diabetes, Depression, Asthma
## Support & customizations
Need custom cohorts (specialty, region, code sets), schema extensions, or volume upgrades? We can generate tailored slices and refresh schedules.
提供机构:
Zalingo AI



