five

gretel-patient-events-v1

收藏
魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/gretelai/gretel-patient-events-v1
下载链接
链接失效反馈
官方服务:
资源简介:
Based on the additional context and data, here's an updated and more detailed README for your dataset: --- # Synthetic Patient Data This dataset is a synthetic representation of patient data, created using Gretel Navigator. It is designed to emulate realistic patient records for use in various research and development scenarios. The dataset adheres to privacy and data protection standards and is licensed under the Apache 2.0 license. ## Dataset Overview The dataset consists of 7,348 rows with the following features: - **patient_id**: Unique identifier for each patient. - **first_name**: First name of the patient. - **last_name**: Last name of the patient. - **date_of_birth**: Date of birth of the patient. - **sex**: Gender of the patient. - **race**: Racial background of the patient. - **weight**: Weight of the patient (in pounds). - **height**: Height of the patient (in inches). - **event_id**: Unique identifier for each medical event. - **event_type**: Type of medical event (e.g., Symptom, Diagnosis Test). - **event_date**: Date when the medical event occurred. - **event_name**: Name of the medical event. - **provider_name**: Name of the healthcare provider. - **reason**: Reason for the medical event. - **result**: Result of the medical event. - **details**: Additional details about the medical event. - **notes**: Any additional notes about the patient's condition or treatment. ### Features This dataset includes various types of data, making it an ideal resource for testing synthetic data models and anonymization techniques: * Numeric: Features like weight, height, and event_id. * Categorical: Features like sex, race, and event_type. * Text: Features like first_name, last_name, provider_name, and reason. * Embedded JSON: Features like details which include JSON objects. * Null: Some fields may contain null values, representing missing data. * Natural Language Text: Features like notes that contain detailed textual information. ### Sample Data Here are the first few rows of the dataset for context: | patient_id | first_name | last_name | date_of_birth | sex | race | weight | height | event_id | event_type | event_date | event_name | provider_name | reason | result | details | notes | |------------|------------|-----------|---------------|--------|---------|--------|--------|----------|--------------|-------------|-------------------|--------------------|-----------------------------------------------|--------|-----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | pmc-6431471-1 | Aisha | Liang | 04/17/1960 | Female | Asian | 135 | 61 | 1 | Admission | 04/17/2023 | Initial admission | Dr. Rosa Fernandez | Generalized malaise, dyspnea, cough | null | {"intensity": "N/A", "location": "N/A"} | Patient admitted with symptoms including malaise, dyspnea on exertion, and cough, exhibiting hypotension and fever on arrival. Initial laboratory tests indicated possible infection, broad-spectrum antibiotics and other treatments were administered, significantly stabilizing patient overnight. | | pmc-6203866-2 | Alejandro | Gomez | 05/16/1978 | Male | Hispanic | 165 | 70 | 1 | Admission | 01/10/2023 | null | St. Mary's Hospital | null | null | {"intensity": "medium", "location": "thorax"} | Patient admitted for work-up related to thorax mass. | | pmc-6203866-2 | Alejandro | Gomez | 05/16/1978 | Male | Hispanic | 165 | 70 | 2 | Diagnosis Test | 01/11/2023 | CT Guided Biopsy | Dr. Lin | null | Lipoma | {"intensity": "null", "location": "null"} | CT guided biopsy revealed a fatty mass which was diagnosed as lipoma. | ## License This dataset is licensed under the Apache 2.0 License. You may use, distribute, and modify the dataset in accordance with the terms of this license. ## Usage This synthetic dataset can be used for: - Developing and testing healthcare applications. - Research and analysis in medical data science. - Training machine learning models in a healthcare context. - Any other purpose where realistic but non-sensitive patient data is required. ## Citation If you use this dataset in your work, please cite it as follows: ``` @dataset{gretel_navigator_synthetic_patient_data, title = {Synthetic Patient Data}, creator = {The Gretel.ai team, using Gretel Navigator}, year = {2024}, url = {https://huggingface.co/datasets/gretelai/synthetic_patient_events}, version = {1.0}, license = {Apache 2.0} } ``` ## Contributing We welcome contributions to improve and expand this dataset. If you have any suggestions or improvements, please submit a pull request or open an issue on the dataset repository. ## Contact For any questions or issues, please contact us at hi@gretel.ai, or on our discord community at https://gretel.ai/discord

基于补充上下文与数据,以下为您的数据集更新后的详细说明文档: # 合成患者数据集 本数据集为合成生成的患者数据,通过Gretel Navigator工具创建,旨在还原真实患者病历,可应用于各类研发与研究场景。本数据集符合隐私与数据保护规范,并采用Apache 2.0开源许可协议。 ## 数据集概览 本数据集共包含7348条数据,具备以下字段: - **patient_id**:每位患者的唯一标识符 - **first_name**:患者名字 - **last_name**:患者姓氏 - **date_of_birth**:患者出生日期 - **sex**:患者性别 - **race**:患者种族背景 - **weight**:患者体重(单位:磅) - **height**:患者身高(单位:英寸) - **event_id**:每一次医疗事件的唯一标识符 - **event_type**:医疗事件类型(如症状、诊断检查) - **event_date**:医疗事件发生日期 - **event_name**:医疗事件名称 - **provider_name**:医疗服务提供方名称 - **reason**:医疗事件的就诊原因 - **result**:医疗事件的结果 - **details**:医疗事件的补充详情 - **notes**:关于患者病情或治疗的额外备注 ### 字段类型说明 本数据集涵盖多种数据类型,是测试合成数据模型与匿名化技术的理想资源: * 数值型:体重、身高、event_id等字段 * 分类变量型:性别、种族、事件类型等字段 * 文本型:姓名、医疗服务提供方名称、就诊原因等字段 * 内嵌JSON型:details字段包含JSON对象 * 空值:部分字段包含空值,代表数据缺失 * 自然语言文本型:notes字段包含详细的文本信息 ### 样本数据 以下为本数据集的前若干条样本数据以供参考: | 患者ID | 名字 | 姓氏 | 出生日期 | 性别 | 种族 | 体重 | 身高 | 医疗事件ID | 医疗事件类型 | 医疗事件日期 | 医疗事件名称 | 医疗服务提供方 | 就诊原因 | 结果 | 详情 | 备注 | |------------|------------|-----------|---------------|--------|---------|--------|--------|----------|--------------|-------------|-------------------|--------------------|-----------------------------------------------|--------|-----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | pmc-6431471-1 | 艾莎 | 梁 | 1960/04/17 | 女 | 亚裔 | 135 | 61 | 1 | 入院 | 2023/04/17 | 首次入院 | 罗莎·费尔南德斯医生 | 全身不适、呼吸困难、咳嗽 | 无 | {"intensity": "N/A", "location": "N/A"} | 患者因全身不适、劳力性呼吸困难及咳嗽入院,入院时表现为低血压与发热。初始实验室检查提示可能存在感染,予广谱抗生素及其他治疗后,患者于夜间病情显著稳定。 | | pmc-6203866-2 | 亚历杭德罗 | 戈麦斯 | 1978/05/16 | 男 | 西班牙裔 | 165 | 70 | 1 | 入院 | 2023/01/10 | 无 | 圣玛丽医院 | 无 | 无 | {"intensity": "medium", "location": "thorax"} | 患者因胸部肿块接受检查入院。 | | pmc-6203866-2 | 亚历杭德罗 | 戈麦斯 | 1978/05/16 | 男 | 西班牙裔 | 165 | 70 | 2 | 诊断检查 | 2023/01/11 | CT引导活检 | 林医生 | 无 | 脂肪瘤 | {"intensity": "null", "location": "null"} | CT引导活检发现一处脂肪性肿块,诊断为脂肪瘤。 | ## 许可协议 本数据集采用Apache 2.0许可协议进行授权。您可根据该协议条款使用、分发及修改本数据集。 ## 应用场景 该合成数据集可应用于以下场景: - 开发与测试医疗健康类应用 - 医疗数据科学领域的研究与分析 - 在医疗场景下训练机器学习模型 - 其他需要使用真实感且非敏感患者数据的场景 ## 引用方式 若您在研究工作中使用本数据集,请按以下格式引用: @dataset{gretel_navigator_synthetic_patient_data, title = {合成患者数据集}, creator = {Gretel.ai团队,使用Gretel Navigator生成}, year = {2024}, url = {https://huggingface.co/datasets/gretelai/synthetic_patient_events}, version = {1.0}, license = {Apache 2.0} } ## 贡献指南 我们欢迎各类贡献以改进并扩展本数据集。若您有任何改进建议或优化方案,请向数据集仓库提交拉取请求(Pull Request)或开启议题(Issue)。 ## 联系方式 若您有任何疑问或问题,请通过邮箱hi@gretel.ai联系我们,或加入我们的Discord社区:https://gretel.ai/discord
提供机构:
maas
创建时间:
2025-05-20
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是由Gretel Navigator生成的合成患者数据,包含7,348条记录,涵盖患者ID、姓名、医疗事件等特征,适用于医疗应用开发、数据科学研究及机器学习模型训练。数据集遵循Apache 2.0许可证,确保隐私合规性。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作