Subtyping of common complex diseases and disorders by integrating heterogeneous data. Identifying clusters among women with lower urinary tract symptoms in the LURN study
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.f4qrfj6zd
下载链接
链接失效反馈官方服务:
资源简介:
We present a methodology for subtyping of persons with a common clinical
symptom complex by integrating heterogeneous continuous and categorical
data. We illustrate it by clustering women with lower urinary tract
symptoms (LUTS), who represent a heterogeneous cohort with overlapping
symptoms and multifactorial etiology. Data collected in the Symptoms of
Lower Urinary Tract Dysfunction Research Network (LURN), a multi-center
observational study, included self-reported urinary and non-urinary
symptoms, bladder diaries, and physical examination data for 545 women.
Heterogeneity in these multidimensional data required thorough and
non-trivial preprocessing, including scaling by controls and weighting to
mitigate data redundancy, while the various data types (continuous and
categorical) required novel methodology using a weighted Tanimoto indices
approach. Data domains only available on a subset of the cohort were
integrated using a semi-supervised clustering approach. Novel contrast
criterion for determination of the optimal number of clusters in consensus
clustering was introduced and compared with existing criteria.
Distinctiveness of the clusters was confirmed by using multiple criteria
for cluster quality, and by testing for significantly different variables
in pairwise comparisons of the clusters. Cluster dynamics were explored by
analyzing longitudinal data at 3- and 12-month follow-up. Five clusters of
women with LUTS were identified using the developed methodology. None of
the clusters could be characterized by a single symptom, but rather by a
distinct combination of symptoms with various levels of severity. Targeted
proteomics of serum samples demonstrated that differentially abundant
proteins and affected pathways are different across the clusters. The
clinical relevance of the identified clusters is discussed and compared
with the current conventional approaches to the evaluation of LUTS
patients. The rationale and thought process are described for the
selection of procedures for data preprocessing, clustering, and cluster
evaluation. Suggestions are provided for minimum reporting requirements in
publications utilizing clustering methodology with multiple heterogeneous
data domains.
提供机构:
Dryad
创建时间:
2022-07-06



