Family Medicine & Primary Care Review

Abstract

1/2025 vol. 27
Original paper

Investigating the use of a large language model in general practice

  1. School of Medicine, University College Cork, Cork, Ireland
  2. School of Medicine, University of Limerick, Limerick, Ireland
Family Medicine & Primary Care Review 2025; 27(1): 19–24
Online publish date: 2025/03/26
View full text
Confronting perimenopausal women’s knowledge of coronary heart disease with their health behaviours. Controversial role of hormone replacement therapy in the protection of coronary heart disease

Background

Large language models have demonstrated strong performance on many tasks. In particular, they have been shown to pass many medical knowledge tests. However, the majority of these studies do not make use of fine tuning.

Objectives

Evaluate the suitability of fine-tuned, large language models in the context of general practice by evaluating performance on the Applied Knowledge Test (AKT) of the Royal College of General Practitioners.

Material and methods

We evaluate the performance of ChatGPT 3.5 in three distinct cases using publicly available practice questions from the Royal College of General Practitioners. In the baseline case, questions are simply input and the answer recorded. In the second case, prompt engineering is used before the questions. Finally, the model is fine-tuned using a subset of the questions and evaluated on the remaining ones.

Results

The fine-tuned model outperforms both the baseline (p = 0.005) and prompt engineering cases (p = 0.010). Furthermore, the model achieves a passing mark on the AKT, with a mean score of 72.03%.

Conclusions

With further development, fine-tuned, large language models could potentially be used by general practitioners to facilitate areas of their practice. Care must be used to ensure that the models conform to stringent standards to avoid misinforming patients or misguiding care.

Share
without publication fees
Coverage in
Integrated with