The Health Gym v2.0 Synthetic Antiretroviral Therapy (ART) for HIV Dataset
收藏DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/The_Health_Gym_v2_0_Synthetic_Antiretroviral_Therapy_ART_for_HIV_Dataset/22827878/1
下载链接
链接失效反馈官方服务:
资源简介:
###===### This synthetic dataset, centred on ART for HIV, was synthesised employing the model outlined in reference [1], incorporating the techniques of WGAN-GP+G_EOT+VAE+Buffer. <br> This dataset serves as a principal resource for the Centre for Big Data Research in Health (CBDRH) Datathon (see: CBDRH Health Data Science Datathon 2023 (cbdrh-hds-datathon-2023.github.io)). Its primary purpose is to advance the Health Data Analytics (HDAT) courses at the University of New South Wales (UNSW), providing students with exposure to synthetic yet realistic datasets that simulate real-world data. <br> The dataset is composed of 534,960 records, distributed over 15 distinct columns, and is preserved in a CSV format with a size of 39.1 MB. It contains information about 8,916 synthetic patients over a period of 60 months, with data summarised on a monthly basis. The total number of records corresponds to the product of the synthetic patient count and the record duration in months, thus equating to 8,916 multiplied by 60. <br> The dataset's structure encompasses 15 columns, which include 13 variables pertinent to ART for HIV as delineated in reference [1], a unique patient identifier, and a further variable signifying the specific time point. <br> ###===### This dataset forms part of a continuous series of work, building upon reference [2]. For further details, kindly refer to our papers: [1] Kuo, Nicholas I., Louisa Jorm, and Sebastiano Barbieri. "Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV." arXiv preprint arXiv:2208.08655 (2022). [2] Kuo, Nicholas I-Hsien, et al. "The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms." Scientific Data 9.1 (2022): 693. <br> ###===### Latest edit: 16th May 2023.
本合成数据集以艾滋病抗逆转录病毒治疗(Antiretroviral Therapy for HIV,ART)为核心研究对象,基于参考文献[1]所述模型生成,融合了WGAN-GP+G_EOT+VAE+Buffer等技术。
该数据集为健康大数据研究中心(Centre for Big Data Research in Health, CBDRH)数据科学挑战赛的核心支撑资源(详见:CBDRH健康数据科学挑战赛2023,cbdrh-hds-datathon-2023.github.io),其核心用途为支持新南威尔士大学(University of New South Wales, UNSW)的健康数据分析(Health Data Analytics, HDAT)课程,为学生提供贴合真实临床场景的合成数据集,帮助学生接触模拟现实世界的医疗数据。
本数据集共计534960条记录,涵盖15个字段,以逗号分隔值(Comma-Separated Values, CSV)格式存储,文件大小为39.1 MB。数据集包含8916名合成患者60个月的随访数据,按月度维度汇总统计;总记录数为合成患者数量与随访月数的乘积,即8916×60。
数据集共包含15个字段,其中13个为参考文献[1]中定义的艾滋病抗逆转录病毒治疗相关变量,1个为唯一患者标识符,另有1个变量用于标记具体随访时间节点。
本数据集属于系列持续研究成果,基于参考文献[2]拓展而来。如需获取更多细节,请参阅以下学术论文:
[1] Kuo, Nicholas I., Louisa Jorm, Sebastiano Barbieri. 《利用生成对抗网络生成具备类别不平衡分布的合成临床数据:以艾滋病抗逆转录病毒治疗为例》,arXiv预印本arXiv:2208.08655 (2022)。
[2] Kuo, Nicholas I-Hsien, 等. 《健康健身房:面向强化学习算法开发的合成健康相关数据集》,Scientific Data 9.1 (2022): 693。
最近更新时间:2023年5月16日。
提供机构:
figshare
创建时间:
2023-05-16
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是HIV抗逆转录病毒治疗的合成临床数据,包含8,916名虚拟患者5年期的月度治疗记录(共534,960条),采用生成对抗网络技术创建,主要用于健康数据科学教育。数据包含15个字段,文件大小39.1MB,遵循CC BY 4.0许可协议。
以上内容由遇见数据集搜集并总结生成



