NPJ Digit Med. 2026 Jun 23. doi: 10.1038/s41746-026-02913-x. Online ahead of print.
ABSTRACT
Electronic health records (EHRs) contain extensive multidimensional patient data, presenting challenges for the discovery of novel and meaningful clinical patterns. Unsupervised clustering of high-dimensional clinical data holds great potential for identifying novel clinical patterns. Here, we performed unsupervised clustering and characterized 100,272 patients in the Electronic Medical Records and GEnomics (eMERGE) Network. We identified 70 clusters defined by distinct comorbidity patterns. Meanwhile, age and sex are also strongly associated with patient stratification, influencing phenotype prevalence and onset time. Notably, phenotype onset time accurately predicted chronological age and was significantly associated with overall mortality risk. Besides age and sex, we assessed the contribution of genetic variation to phenotype development and observed evidence of cross-phenotype associations influencing cluster membership and comorbidity patterns. However, the role of genetics recedes during aging. We also identified several high-risk clusters with elevated Charlson Comorbidity Index (CCI) scores and validated these findings in an independent cohort. Further analysis of these clusters revealed phenotypes linked to premature aging and highlighted a survival selection among older participants in observational studies. Overall, this study enables phenome-wide unsupervised patient stratification for multimorbidity discovery in largely unannotated clinical data, offering valuable insights into patient stratification, comorbidity analysis, aging, and health outcomes.
PMID:42337381 | DOI:10.1038/s41746-026-02913-x
Share Evidence Blueprint

Search Google Scholar
Save as PDF

