Welcome to PsychiatryAI.com: [PubMed] - Psychiatry AI Latest

Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

Evidence

Laryngoscope. 2024 Sep 21. doi: 10.1002/lary.31781. Online ahead of print.

ABSTRACT

OBJECTIVE: The purpose of this study was to evaluate the performance of advanced large language models from OpenAI (GPT-3.5 and GPT-4), Google (PaLM2 and MedPaLM), and an open source model from Meta (Llama3:70b) in answering clinical test multiple choice questions in the field of otolaryngology-head and neck surgery.

METHODS: A dataset of 4566 otolaryngology questions was used; each model was provided a standardized prompt followed by a question. One hundred questions that were answered incorrectly by all models were further interrogated to gain insight into the causes of incorrect answers.

RESULTS: GPT4 was the most accurate, correctly answering 3520 of 4566 questions (77.1%). MedPaLM correctly answered 3223 of 4566 (70.6%) questions, while llama3:70b, GPT3.5, and PaLM2 were correct on 3052 of 4566 (66.8%), 2672 of 4566 (58.5%), and 2583 of 4566 (56.5%) questions. Three hundred and sixty-nine questions were answered incorrectly by all models. Prompts to provide reasoning improved accuracy in all models: GPT4 changed from incorrect to correct answer 31% of the time, while GPT3.5, Llama3, PaLM2, and MedPaLM corrected their responses 25%, 18%, 19%, and 17% of the time, respectively.

CONCLUSION: Large language models vary in their understanding of otolaryngology-specific clinical knowledge. OpenAI’s GPT4 has a strong understanding of core concepts as well as detailed information in the field of otolaryngology. Its baseline understanding in this field makes it well-suited to serve in roles related to head and neck surgery education provided that the appropriate precautions are taken and potential limitations are understood.

LEVEL OF EVIDENCE: N/A Laryngoscope, 2024.

PMID:39305216 | DOI:10.1002/lary.31781

Document this CPD Copy URL Button

Google

Google Keep

LinkedIn Share Share on Linkedin

Estimated reading time: 5 minute(s)

Latest: Psychiatryai.com #RAISR4D Evidence

Cool Evidence: Engaging Young People and Students in Real-World Evidence

Real-Time Evidence Search [Psychiatry]

AI Research

Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

Copy WordPress Title

🌐 90 Days

Evidence Blueprint

Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

QR Code

☊ AI-Driven Related Evidence Nodes

(recent articles with at least 5 words in title)

More Evidence

Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

🌐 365 Days

Floating Tab
close chatgpt icon
ChatGPT

Enter your request.

Psychiatry AI RAISR 4D System Psychiatry + Mental Health