Supporting data for "Deep learning for clustering of multivariate clinical patient trajectories with missing values"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100662
下载链接
链接失效反馈官方服务:
资源简介:
Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a perpatient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series, because (1) these diseases are multifactorial and not well described by single clinical outcome variables, and (2) disease progression needs to be monitored over time. Clinical datasets often additionally suffer from the presence of many missing
values, further complicating any clustering attempts.<br> The problem of clustering multivariate short time series with many missing values has generally not been well addressed in the literature so far. In this work, we propose a deep learning-based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended by (1) incorporating long short term memory units and (2) defining an appropriate approach to directly deal with missing values via implicit imputation and loss re-weighting. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify Alzheimer's disease (AD) patients and Parkinson's disease (PD) patients into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of AD and PD.<br>We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate short time series clustering in general.
精准医学(Precision medicine)需要根据疾病表现对患者进行分层,这种分层需具备足够的信息量,以便实现个体化治疗方案的选择。对于许多疾病(如神经系统疾病)而言,这一分层问题转化为多元且相对短时间序列的复杂聚类问题,原因在于:(1)此类疾病为多因素疾病,单一临床结局变量难以充分描述;(2)疾病进展需进行长期监测。此外,临床数据集通常存在大量缺失值,这进一步增加了聚类尝试的复杂性。
迄今为止,文献中对于含大量缺失值的多元短时间序列聚类问题尚未有充分探讨。在本研究中,我们提出一种基于深度学习的方法——循环变分深度嵌入(variational deep embedding with recurrence, VaDER),以解决这一问题。VaDER基于高斯混合变分自动编码器(Gaussian mixture variational autoencoder)框架,并通过以下两点进行扩展:(1)融入长短期记忆单元(long short term memory units);(2)定义一种合适的方法,通过隐式插补和损失重加权直接处理缺失值。我们通过在不同缺失程度下,从具有已知真实聚类结果的模拟数据和基准数据中准确恢复聚类,对VaDER进行了验证。随后,我们应用VaDER成功将阿尔茨海默病(Alzheimer's disease, AD)患者和帕金森病(Parkinson's disease, PD)患者分层为具有临床差异的疾病进展特征亚组。进一步分析表明,这些临床差异反映了AD和PD已知的潜在病理特征。
我们认为,研究结果表明VaDER对于未来的患者分层工作以及一般意义上的多元短时间序列聚类具有重要价值。
提供机构:
GigaScience Database
创建时间:
2019-10-18



