Evaluating a Consumer LLM for Suicide Risk Response Calibration: A Pilot Study – Psychiatry AI: Real-Time AI Scoping Review

AI Summary

Gemini 2.5 Flash broadly matched clinical triage patterns; relational support and action orientation appeared across all suicide risk levels.
Safety escalation elements increased with higher risk, while psychoeducation declined and coping skills appeared only for non-suicidal distress.
Gemini failed to recognise implicit suicidality in ambiguous ideation; general-purpose LLMs useful for research and training but unsuitable for autonomous crisis intervention.

Stud Health Technol Inform. 2026 May 21;336:1009-1013. doi: 10.3233/SHTI260331.

ABSTRACT

Large language models (LLMs) are increasingly used for information seeking and self-guided support, raising safety concerns in high-risk contexts such as suicidal ideation. While prior work suggests LLMs can detect suicidal language, their ability to calibrate responses across suicide risk levels remains unclear. This pilot study evaluated Gemini 2.5 Flash using 60 Self-Directed Violence Classification System (SDVCS) clinical vignettes spanning non-suicidal distress (L0), suicidal ideation without plan (L1), and imminent suicide risk (L2) (n=20 per level). Outputs were coded for seven response elements: relational support (REL), reflection (REF), psychoeducation (PSY), coping skills (COP), action orientation (ACT), risk acknowledgment (RISK), and crisis resources (INFO). REL and ACT were common across levels. Safety escalation elements increased with risk levels whereas PSY declined with increasing risk and COP was present only at no suicidal risk level. Notably, the model failed to recognize implicit suicidality in an ambiguous ideation vignette. Overall, Gemini 2.5 Flash approximated clinical triage patterns but demonstrated critical limitations in detecting indirect suicidal expressions. These findings suggest that general-purpose LLMs may support supervised triage research and training but remain unsuitable for autonomous crisis intervention.

PMID:42175005 | DOI:10.3233/SHTI260331

Document this CPD