Przegląd Dermatologiczny

Abstract

1/2024 vol. 111
Original article

Exploring the performance of ChatGPT-3.5 in addressing dermatological queries: a research investigation into AI capabilities

  1. Students’ Scientific Association of Computer Analysis and Artificial Intelligence at the Department of Radiology and Nuclear Medicine of the Medical University of Silesia, Katowice, Poland
  2. Department of Radiodiagnostics, Interventional Radiology and Nuclear Medicine, Medical University of Silesia, Katowice, Poland
  3. Department of Radiology and Nuclear Medicine, Medical University of Silesia, Katowice, Poland
  4. Multi-specialty District Hospital S.A. Dr. B. Hager Pyskowicka, Tarnowskie Góry, Poland
  5. Department of Medical and Molecular Biology, Faculty of Medical Sciences in Zabrze, Medical University of Silesia in Katowice, Poland
  6. Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
Dermatol Rev/Przegl Dermatol 2024, 111, 26-30
Online publish date: 2024/06/28
View full text
Confronting perimenopausal women’s knowledge of coronary heart disease with their health behaviours. Controversial role of hormone replacement therapy in the protection of coronary heart disease

Introduction:

In the 21st century’s era of rapid technological advancement, ChatGPT-3.5, an artificial intelligence (AI) language model, is scrutinized for its application in dermatology. Using 119 questions from the National Specialist Examination (PES), we assess ChatGPT-3.5’s performance by comparing it to human skills and addressing ethical implications.

Objective:

Our primary aim is to evaluate ChatGPT-3.5’s proficiency in responding to 119 dermatology questions from the PES. The study emphasizes ethical considerations and compares the model’s knowledge and skills to those of human dermatologists.

Material and methods:

Utilizing the 2023 PES question database, questions were categorized by Bloom’s taxonomy and thematic content. ChatGPT-3.5, version of 3 August 2023, answered 119 questions in five sessions, allowing for a probabilistic evaluation. Statistical analyses, conducted using R Studio, assessed correctness, confidence, and difficulty.

Results:

ChatGPT-3.5 achieved a 49.58% correct response rate, below the 60% passing threshold. No significant differences in difficulty or correlations between difficulty and certainty were observed. Varied performance across question types highlighted strengths and weaknesses. Despite suboptimal results, ChatGPT-3.5’s differential performance offers insights, suggesting future improvements. The study advocates for ongoing research into AI integration in dermatology, envisioning a promising role for AI in assisting dermatologists.

Conclusions:

Ethical considerations are crucial for effective AI introduction, minimizing errors, and enhancing dermatological healthcare quality, fostering optimism for AI’s evolving role in dermatology.

Share
without publication fees
without publication fees