Welcome to PsychiatryAI.com: [PubMed] - Psychiatry AI Latest

Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

Evidence

BMC Med Educ. 2024 Sep 3;24(1):962. doi: 10.1186/s12909-024-05881-6.

ABSTRACT

BACKGROUND: This study aimed to answer the research question: How reliable is ChatGPT in automated essay scoring (AES) for oral and maxillofacial surgery (OMS) examinations for dental undergraduate students compared to human assessors?

METHODS: Sixty-nine undergraduate dental students participated in a closed-book examination comprising two essays at the National University of Singapore. Using pre-created assessment rubrics, three assessors independently performed manual essay scoring, while one separate assessor performed AES using ChatGPT (GPT-4). Data analyses were performed using the intraclass correlation coefficient and Cronbach’s α to evaluate the reliability and inter-rater agreement of the test scores among all assessors. The mean scores of manual versus automated scoring were evaluated for similarity and correlations.

RESULTS: A strong correlation was observed for Question 1 (r = 0.752-0.848, p < 0.001) and a moderate correlation was observed between AES and all manual scorers for Question 2 (r = 0.527-0.571, p < 0.001). Intraclass correlation coefficients of 0.794-0.858 indicated excellent inter-rater agreement, and Cronbach’s α of 0.881-0.932 indicated high reliability. For Question 1, the mean AES scores were similar to those for manual scoring (p > 0.05), and there was a strong correlation between AES and manual scores (r = 0.829, p < 0.001). For Question 2, AES scores were significantly lower than manual scores (p < 0.001), and there was a moderate correlation between AES and manual scores (r = 0.599, p < 0.001).

CONCLUSION: This study shows the potential of ChatGPT for essay marking. However, an appropriate rubric design is essential for optimal reliability. With further validation, the ChatGPT has the potential to aid students in self-assessment or large-scale marking automated processes.

PMID:39227811 | DOI:10.1186/s12909-024-05881-6

Document this CPD Copy URL Button

Google

Google Keep

LinkedIn Share Share on Linkedin

Estimated reading time: 5 minute(s)

Latest: Psychiatryai.com #RAISR4D Evidence

Cool Evidence: Engaging Young People and Students in Real-World Evidence

Real-Time Evidence Search [Psychiatry]

AI Research

Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

Copy WordPress Title

🌐 90 Days

Evidence Blueprint

Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

QR Code

☊ AI-Driven Related Evidence Nodes

(recent articles with at least 5 words in title)

More Evidence

Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

🌐 365 Days

Floating Tab
close chatgpt icon
ChatGPT

Enter your request.

Psychiatry AI RAISR 4D System Psychiatry + Mental Health