Accuracy of Descriptive and Illustrative Capacity of Generative Artificial Intelligence in Aesthetic Surgery: A Study of Accelerated Facial Aging in Depressed Medical Residents

AI Summary

Generative AI reliably produces descriptive texts and hyperrealistic images illustrating depression related facial ageing in medical residents, demonstrating potential clinical and educational utility.
Psychiatrists rated outputs higher than plastic surgeons, reflecting emphasis on emotional cues versus anatomical ageing markers and diverging clinical priorities.
ChatGPT-4o led among LLMs; BlueWillow V5 and Stable Diffusion Ultra were preferred GANs, supporting AI integration for training, awareness, and diagnostic understanding.

Aesthetic Plast Surg. 2026 May 22. doi: 10.1007/s00266-026-05895-z. Online ahead of print.

ABSTRACT

BACKGROUND: With the integration of artificial intelligence (AI) into various medical specialties, its role in aesthetic surgery is rapidly gaining traction for diagnostic, educational, and therapeutic applications. Depression, a condition known to accelerate facial aging, presents a novel intersection for AI and psychiatry within cosmetic practice.

OBJECTIVE: This study aimed to evaluate the accuracy and clinical relevance of generative AI models in describing and visualizing facial features of depressed medical residents.

METHODS: We employed advanced large language models (LLMs) and generative adversarial networks (GANs), including ChatGPT-models, Gemini, Midjourney, LeonardoAI, and others, to generate descriptive texts and hyperrealistic images of “depressed medical residents.” Three experienced plastic surgeons and three psychiatrists assessed the outputs using customized 7- and 8-parameter Likert scales, scoring the clarity, realism, emotional nuance, resemblance to clinical cases, and overall utility. To reduce bias, all experts were blinded to the identity of the AI models that produced each output.

RESULTS: Psychiatrists generally rated the models higher than plastic surgeons, with ChatGPT-4o receiving the highest score among LLMs. Among GANs, BlueWillow V5 and Stable Diffusion Ultra were preferred by psychiatrists and plastic surgeons, respectively. Discrepancies in scoring reflect differing clinical priorities-emotional cues for psychiatrists versus anatomical aging markers for surgeons.

CONCLUSION: AI-generated imagery and descriptions demonstrate potential in bridging aesthetic medicine and mental health education. These tools can enhance physician awareness, clinical training, and diagnostic understanding of depression-related facial aging, especially among vulnerable populations like medical trainees.

LEVEL OF EVIDENCE V: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

PMID:42174159 | DOI:10.1007/s00266-026-05895-z

Document this CPD