Life events extraction from healthcare notes for veteran acute suicide risk prediction

J Am Med Inform Assoc. 2026 Mar 16:ocaf197. doi: 10.1093/jamia/ocaf197. Online ahead of print.

ABSTRACT

OBJECTIVE: Predictive models of suicide risk have focused on features extracted from structured data found in electronic health records, with limited consideration of predisposing life events (LE) expressed in unstructured clinical text such as housing instability and marital troubles. This study aims to expand upon previous research, demonstrating how high-performance computing (HPC) and machine learning methodologies can be used to extract and annotate 8 LE across all Veterans Health Administration (VHA) unstructured clinical text data with enriched performance metrics. Integration of the 8 LE with the structured features using different statistical and machine learning (ML) methods is also discussed.

MATERIALS/METHODS: VHA-wide clinical text from January 2000 to January 2022 was pre-processed and analyzed using HPC. Data-driven lexicon curation enabled a rule-based annotator to extract LE, followed by machine learning for improved positive predictive value (PPV). NLP results were analyzed longitudinally and then integrated and compared to a baseline statistical model predicting risk for a combined outcome (suicide death, suicide attempt and overdose).

RESULTS: First-time LE mentions showed a significant temporal correlation to suicide-related events (SRE) (suicide ideation, attempt and/or death) and are not associated with administrative bias. Predictive linear regression (LR) models integrating NLP-derived LE show an improved AUC of 0.81 and novel patient identification of up to 18%.

DISCUSSION: Our analysis shows that these methodologies helped improve performance metrics significantly from previous work, while outperforming related works. These results demonstrated that NLP-derived LE served as acute predictors for SRE.

CONCLUSION: NLP integration into predictive models may help improve clinician decision support. Future work is necessary to better define and integrate these and other potential LE.

PMID:41842607 | DOI:10.1093/jamia/ocaf197

Document this CPD