Evidence
BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.
ABSTRACT
OBJECTIVE: The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models’ responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom’s taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.
RESULTS: ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.
PMID:39228001 | DOI:10.1186/s13104-024-06920-7
Estimated reading time: 4 minute(s)
Latest: Psychiatryai.com #RAISR4D Evidence
Cool Evidence: Engaging Young People and Students in Real-World Evidence
Real-Time Evidence Search [Psychiatry]
AI Research
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
🌐 90 Days
AI Virtual Reality Related Evidence Matrix
- Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic
- Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study
- Can artificial intelligence models serve as patient information consultants in orthodontics?
- Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial
- Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study
- Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
- Evaluating generative AI responses to real-world drug-related questions
- Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash
- End-of-life Care Patient Information Leaflets-A Comparative Evaluation of Artificial Intelligence-generated Content for Readability, Sentiment, Accuracy, Completeness, and Suitability: ChatGPT vs Google Gemini
- Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions
- Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination
- Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients
- Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?
- The Performance of ChatGPT-4 and Gemini Ultra 1.0 for Quality Assurance Review in Emergency Medical Services Chest Pain Calls
- Artificial Intelligence-Generated Patient Education Materials for Helicobacter pylori Infection: A Comparative Analysis
- Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination
- Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists
- Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists
- An assessment of ChatGPT's responses to frequently asked questions about cervical and breast cancer
- Cross-Cultural Adaptation and Validation of Nursing Stress Scale in Jordanian Nurses
- Cross-Cultural Adaptation and Validation of Nursing Stress Scale in Jordanian Nurses
- Using large language model to guide patients to create efficient and comprehensive clinical care message
- ChatGPT has Educational Potential: Assessing ChatGPT Responses to Common Patient Hip Arthroscopy Questions
- Performance of Artificial Intelligence Content Detectors Using Human and Artificial Intelligence-Generated Scientific Writing
- Performance of Artificial Intelligence Content Detectors Using Human and Artificial Intelligence-Generated Scientific Writing
- Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4
- Doctor AI? A pilot study examining responses of artificial intelligence to common questions asked by geriatric patients
- Doctor AI? A pilot study examining responses of artificial intelligence to common questions asked by geriatric patients
- Doctor AI? A pilot study examining responses of artificial intelligence to common questions asked by geriatric patients
- Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese
- Assessing the appropriateness and completeness of ChatGPT-4's AI-generated responses for queries related to diabetic retinopathy
- Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment
- The Use of Generative Artificial Intelligence for Improving Health Literacy in Reproductive Health: A Case Study
- Exploring the potential of artificial intelligence to enhance the writing of english academic papers by non-native english-speaking medical students - the educational application of ChatGPT
- Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study
- Comparing ChatGPT and a Single Anesthesiologist's Responses to Common Patient Questions: An Exploratory Cross-Sectional Survey of a Panel of Anesthesiologists
- Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain
- Amplifying Chinese physicians' emphasis on patients' psychological states beyond urologic diagnoses with ChatGPT-A multi-center cross-sectional study
- Quality and correctness of AI-generated versus human-written abstracts in psychiatric research papers
- Appropriateness of ChatGPT as a resource for medication-related questions
- Evaluation of online chat-based artificial intelligence responses about inflammatory bowel disease and diet
- A Review of Ophthalmology Education in the Era of Generative Artificial Intelligence
- Utilizing Artificial Intelligence Application for Diagnosis of Oral Lesions and Assisting Young Oral Histopathologist in Deriving Diagnosis from Provided Features - A Pilot study
- ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study
- Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey
- Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey
- Parental concerns about oral health of children: Is ChatGPT helpful in finding appropriate answers?
- Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis
- Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study
- Artificial Versus Human Intelligence in the Diagnostic Approach of Ophthalmic Case Scenarios: A Qualitative Evaluation of Performance and Consistency
- Registered nurses' perceptions of nursing student preceptorship: Content analysis of open-ended survey questions
- Prompt Engineering for Nurse Educators
- LGBTQIA health in medical education: a national survey of Australian medical students
- Large Language Model-Based Responses to Patients' In-Basket Messages
- An Evening with Natalka Vorozhbyt
- Aesthetic and Symbolic Dimensions of Arabic Writing (ASDAW) Symposium
- New training, new attitudes: non-clinical components in Ukrainian medical PHDs training (regarding critical thinking, academic integrity and artificial intelligence use)
- ChatGPT can help guide and empower patients after prostate cancer diagnosis
- Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases
- Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study
- Online content on eating disorders: a natural language processing study
- Impact of the Nursing and Midwifery Council (2018) future nurse: Standards of proficiency for registered nurses on children's nursing curriculum - A cross-sectional study
- Impact of the Nursing and Midwifery Council (2018) future nurse: Standards of proficiency for registered nurses on children's nursing curriculum - A cross-sectional study
- Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models
Evidence Blueprint
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
☊ AI-Driven Related Evidence Nodes
(recent articles with at least 5 words in title)
More Evidence
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
🌐 365 Days
AI Virtual Reality Related Evidence Matrix
- Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic
- Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study
- Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy
- ChatGPT as a teaching tool: Preparing pathology residents for board examination with AI-generated digestive system pathology tests
- Can artificial intelligence models serve as patient information consultants in orthodontics?
- Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial
- Large language models for generating medical examinations: systematic review
- Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study
- Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
- Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment
- The Effect of Using an Arabic Assistive Application on Improving the Ability of Children with Autism Spectrum Disorder to Comprehend and Answer Content Questions
- Comparing the Performance of Artificial Intelligence Learning Models to Medical Students in Solving Histology and Embryology Multiple Choice Questions
- Language-adaptive artificial intelligence: assessing CHATGPT'S answer to frequently asked questions on total hip arthroplasty questions
- Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy
- Crafting medical MCQs with generative AI: A how-to guide on leveraging ChatGPT
- Evaluating generative AI responses to real-world drug-related questions
- Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini
- Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash
- Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study
- Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications
- Evaluating a Large Language Model's Ability to Answer Clinicians' Requests for Evidence Summaries
- End-of-life Care Patient Information Leaflets-A Comparative Evaluation of Artificial Intelligence-generated Content for Readability, Sentiment, Accuracy, Completeness, and Suitability: ChatGPT vs Google Gemini
- Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam
- Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum
- Assessing ChatGPT's Responses to Otolaryngology Patient Questions
- Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis
- Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis
- Evaluating the Performance of ChatGPT in Urology: A Comparative Study of Knowledge Interpretation and Patient Guidance
- Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media
- Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study
- Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions
- Appropriateness of Frequently Asked Patient Questions Following Total Hip Arthroplasty From ChatGPT Compared to Arthroplasty-Trained Nurses
- Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports
- Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery
- Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced
- Frequently asked questions on erectile dysfunction: evaluating artificial intelligence answers with expert mentorship
- Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement
- Computerized diagnostic decision support systems - a comparative performance study of Isabel Pro vs. ChatGPT4
- Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure
- neuroGPT-X: toward a clinic-ready large language model
- Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content
- Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison
- GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination
- Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination
- Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions
- ChatGPT: May Help Inform Patients in Dental Implantology
- ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study
- Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients
- Assessing the Quality of ChatGPT Responses to Dementia Caregivers' Questions: Qualitative Analysis
- ChatGPT and the German board examination for ophthalmology: an evaluation
- Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?
- Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study
- Comparing ChatGPT and clinical nurses' performances on tracheostomy care: A cross-sectional study
- The Performance of ChatGPT-4 and Gemini Ultra 1.0 for Quality Assurance Review in Emergency Medical Services Chest Pain Calls
- Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis
- Psychiatrists’ experiences and opinions of generative artificial intelligence in mental healthcare: An online mixed methods survey
- Artificial Intelligence-Generated Patient Education Materials for Helicobacter pylori Infection: A Comparative Analysis
- The utility of ChatGPT as a generative medical translator
- A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone
- Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions
- Validation and cross-cultural adaptation of the six-dimension scale of nursing performance- arabic version
- Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination
- Evaluating the effectiveness of artificial intelligence-based tools in detecting and understanding sleep health misinformation: Comparative analysis using Google Bard and OpenAI ChatGPT-4
- Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists