A Curated Multimodal Dataset for Sleep Apnea and Cardiometabolic Comorbidities (Healthcare)
收藏Mendeley Data2026-05-21 收录
下载链接:
https://data.mendeley.com/datasets/dms8vyw4j9
下载链接
链接失效反馈官方服务:
资源简介:
This dataset represents the Gold Layer of a Sleep Apnea Data Warehouse developed using a Medallion Architecture (Bronze–Silver–Gold) in Microsoft SQL Server. It contains 10,044 unique patient records and 47 curated analytical features integrating demographic, physiological, clinical, lifestyle, and diagnostic data for research on obstructive sleep apnea (OSA) and metabolic comorbidities.
The dataset was generated by integrating multiple structured health data sources through ETL processes, dimensional modeling, and feature engineering into a unified star-schema warehouse. Each record corresponds to a single patient and includes demographics (age, gender, occupation), metabolic indicators (BMI, glucose, insulin, HbA1c, cholesterol), cardiovascular variables (blood pressure, heart rate), sleep-related physiological measurements (AHI, oxygen saturation, EEG sleep stage, nasal airflow, chest movement), lifestyle indicators (physical activity, stress, diet, alcohol use), and diagnostic labels for sleep apnea, hypertension, and diabetes.
The Gold Layer includes engineered variables such as age bands, BMI categories, comorbidity profiles, binary health flags, and standardized analytical features optimized for machine learning and clinical analytics. The repository was designed to support predictive modeling, multi-label classification, risk stratification, clustering, and healthcare business intelligence applications.
Exported in CSV format with UTF-8 encoding, the dataset is compatible with Python, R, SQL Server, Power BI, Tableau, and statistical analysis tools. Synthetic composite identifiers are used, and no personally identifiable information is included, supporting ethical data sharing for research and educational purposes.
Potential applications include OSA diagnosis prediction, comorbidity risk scoring, explainable machine learning, patient segmentation, feature engineering research, and demonstration of Medallion Architecture implementation in healthcare data warehousing. This dataset also serves as a reproducible benchmark for integrating data engineering and medical analytics workflows.
Keywords: Sleep Apnea, OSA, Healthcare Analytics, Data Warehouse, Gold Layer, Medallion Architecture, Predictive Modeling, Multi-label Classification, Clinical Data Engineering.
本数据集为基于微软SQL Server构建、采用奖章式架构(Medallion Architecture,即Bronze-Silver-Gold分层架构)的睡眠呼吸暂停数据仓库的黄金层(Gold Layer)。数据集包含10044条唯一患者记录,以及47项精心甄选的分析特征,整合了人口统计学、生理学、临床、生活方式及诊断相关数据,用于阻塞性睡眠呼吸暂停(obstructive sleep apnea, OSA)与代谢共病的相关研究。
本数据集通过ETL流程、维度建模与特征工程,整合多源结构化医疗数据源,构建为统一的星型模式数据仓库。每条记录对应单一名患者,涵盖人口统计学信息(年龄、性别、职业)、代谢指标(身体质量指数BMI、血糖、胰岛素、糖化血红蛋白HbA1c、胆固醇)、心血管变量(血压、心率)、睡眠相关生理学测量指标(呼吸暂停低通气指数AHI、血氧饱和度、脑电图EEG睡眠分期、鼻腔气流、胸部运动)、生活方式指标(体力活动、压力水平、饮食、饮酒情况),以及睡眠呼吸暂停、高血压、糖尿病的诊断标签。
黄金层(Gold Layer)包含经特征工程生成的变量,例如年龄分层、身体质量指数分类、共病概况、二分类健康标记,以及为机器学习与临床分析优化的标准化分析特征。本数据集仓库旨在支持预测建模、多标签分类、风险分层、聚类分析以及医疗商业智能应用。
本数据集以UTF-8编码的CSV格式导出,可兼容Python、R、SQL Server、Power BI、Tableau及各类统计分析工具。数据集采用合成复合标识符,未包含任何个人可识别信息,可合规用于研究与教育场景的数据共享。
本数据集的潜在应用场景包括阻塞性睡眠呼吸暂停(OSA)诊断预测、共病风险评分、可解释机器学习、患者分群、特征工程研究,以及医疗数据仓库中奖章式架构(Medallion Architecture)落地的演示。同时,本数据集可作为数据工程与医疗分析工作流整合的可复现基准数据集。
关键词:睡眠呼吸暂停、阻塞性睡眠呼吸暂停(OSA)、医疗分析、数据仓库、黄金层(Gold Layer)、奖章式架构(Medallion Architecture)、预测建模、多标签分类、临床数据工程。
创建时间:
2026-04-27



