synthetic-healthcare-admissions
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/syncora/synthetic-healthcare-admissions
下载链接
链接失效反馈官方服务:
资源简介:
# Synthetic Healthcare Admissions Dataset
### A fully synthetic **healthcare dataset** for building AI solutions in healthcare, developed using Syncora.ai.
---
## ✅ What's in This Repo?
- ✅ **Healthcare Dataset (CSV)** → [Download Here](https://huggingface.co/datasets/syncora/synthetic-healthcare-admissions/blob/main/Healthcare_Syncora_Synthetic%201.csv)
- ✅ **Example Jupyter Notebook** → [Open Notebook](https://huggingface.co/datasets/syncora/synthetic-healthcare-admissions/blob/main/Healthcare_Syncora_Synthetic_1%20(1).ipynb)
- ✅ **Use cases**
---
## 📘 About This Dataset
This **synthetic healthcare dataset** simulates hospital admission records including demographics, billing, medications, and lab results.
It is **100% synthetic**, ensuring privacy and regulatory compliance for developers, healthcare institutes, and those training LLM models.
**Why use this dataset?**
- Explore **predictive modeling in healthcare**
- Build **dataset for LLM training** for clinical conversations
- Safely **generate synthetic data** without exposing real patient info
---
## 🔍 Dataset Snapshot
| Column | Description |
|--------------------|----------------------------------------------------|
| **Age** | Patient age in years |
| **Gender** | 0 = Female, 1 = Male |
| **Blood Type** | Encoded blood group category (0–7) |
| **Medical Condition** | Encoded diagnosis category |
| **Billing Amount** | Hospital billing in USD |
| **Admission Type** | 0 = Emergency, 1 = Urgent, 2 = Elective |
| **Medication** | Encoded medication type |
| **Test Results** | Encoded lab test result category |
**Example row:**
`80, 1, 7, 0, 37303.07, 0, 0, 0`
---
## ✅ Use Cases
This **healthcare dataset** is ideal for:
- 🏥 **Predictive Healthcare Analytics** – Predict billing amount, admission type, or risk scores
- 💊 **Medication Optimization Models** – Analyze treatment outcomes
- 🗣 **Healthcare Chatbots** – Train conversational LLMs on realistic medical workflows
- 📊 **Cost Forecasting** – Estimate hospital expenses
- 🧠 **Dataset for LLM Training** – Fine-tune models for clinical Q&A or triage
---
## 🚀 Generate Your Own Synthetic Data
Need custom scenarios? Use our tool to generate synthetic data tailored to your requirements:
👉 [**Generate your own Synthetic Data Now**](https://huggingface.co/spaces/syncora/synthetic-generation)
---
## ⚡ Quick Start
```python
from datasets import load_dataset
dataset = load_dataset("syncora/synthetic-healthcare-admissions")
df = dataset["train"].to_pandas()
print(df.head())
# 合成医疗入院数据集
### 完全合成的医疗数据集(healthcare dataset),依托Syncora.ai开发,用于构建医疗领域人工智能解决方案。
---
## ✅ 仓库内容概览
- ✅ **医疗数据集(CSV格式)** → [点击下载](https://huggingface.co/datasets/syncora/synthetic-healthcare-admissions/blob/main/Healthcare_Syncora_Synthetic%201.csv)
- ✅ **示例Jupyter Notebook** → [打开 Notebook](https://huggingface.co/datasets/syncora/synthetic-healthcare-admissions/blob/main/Healthcare_Syncora_Synthetic_1%20(1).ipynb)
- ✅ **应用场景**
---
## 📘 数据集简介
本**合成医疗数据集(synthetic healthcare dataset)**可模拟医院入院记录,涵盖人口统计学信息、账单明细、用药情况与实验室检测结果。
该数据集为100%合成生成,可保障开发者、医疗机构以及训练大语言模型(Large Language Model,LLM)的人员的数据隐私与合规性。
**为何选择本数据集?**
- 开展医疗领域预测建模研究
- 构建用于临床对话场景的大语言模型训练数据集
- 在不泄露真实患者信息的前提下,安全生成合成数据
---
## 🔍 数据集快照
| 列名 | 描述说明 |
|--------------------|----------------------------------------------------|
| **年龄(Age)** | 患者年龄(单位:岁) |
| **性别(Gender)** | 0 = 女性,1 = 男性 |
| **血型(Blood Type)** | 编码后的血型分类(取值范围0~7) |
| **病症类型(Medical Condition)** | 编码后的诊断分类 |
| **账单金额(Billing Amount)** | 医院账单金额(单位:美元) |
| **入院类型(Admission Type)** | 0 = 急诊,1 = 加急,2 = 择期 |
| **用药类型(Medication)** | 编码后的用药类别 |
| **检测结果(Test Results)** | 编码后的实验室检测结果分类 |
**示例数据行:**
`80, 1, 7, 0, 37303.07, 0, 0, 0`
---
## ✅ 典型应用场景
本医疗数据集适用于以下场景:
- 🏥 **预测性医疗分析** – 预测账单金额、入院类型或风险评分
- 💊 **用药优化模型** – 分析治疗效果
- 🗣 **医疗聊天机器人** – 基于真实医疗流程训练对话式大语言模型
- 📊 **成本预测** – 估算医院运营开支
- 🧠 **大语言模型训练数据集** – 微调模型以支持临床问答或分诊服务
---
## 🚀 生成自定义合成数据
需要定制化场景?可使用我们的工具生成符合您需求的合成数据:
👉 [**立即生成专属合成数据**](https://huggingface.co/spaces/syncora/synthetic-generation)
---
## ⚡ 快速入门
python
from datasets import load_dataset
dataset = load_dataset("syncora/synthetic-healthcare-admissions")
df = dataset["train"].to_pandas()
print(df.head())
提供机构:
maas
创建时间:
2025-08-31
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个完全合成的医疗数据集,模拟医院入院记录,涵盖年龄、性别、医疗状况、账单金额等字段,旨在支持医疗AI开发、预测分析和LLM训练,同时确保数据隐私和合规性。它适用于医疗聊天机器人、成本预测和药物优化等多种应用场景。
以上内容由遇见数据集搜集并总结生成



