Investigating the use of a large language model in general practice

Wesley Chorney; Sing Hui Ling

doi:10.5114/fmpcr.2025.146965

Abstract

1/2025 vol. 27

Original paper

Investigating the use of a large language model in general practice

Wesley Chorney ¹

,

Sing Hui Ling ²

School of Medicine, University College Cork, Cork, Ireland
School of Medicine, University of Limerick, Limerick, Ireland

Family Medicine & Primary Care Review 2025; 27(1): 19–24

DOI: https://doi.org/10.5114/fmpcr.2025.146965

Online publish date: 2025/03/26

View full text

AMA

Chorney W, Ling S. Investigating the use of a large language model in general practice. Family Medicine & Primary Care Review. 2025;27(1):19-24. doi:10.5114/fmpcr.2025.146965.

APA

Chorney, W., & Ling, S. (2025). Investigating the use of a large language model in general practice. Family Medicine & Primary Care Review, 27(1), 19-24. https://doi.org/10.5114/fmpcr.2025.146965

Chicago

Chorney, Wesley, and Sing Hui Ling. 2025. "Investigating the use of a large language model in general practice". Family Medicine & Primary Care Review 27 (1): 19-24. doi:10.5114/fmpcr.2025.146965.

Harvard

Chorney, W., and Ling, S. (2025). Investigating the use of a large language model in general practice. Family Medicine & Primary Care Review, 27(1), pp.19-24. https://doi.org/10.5114/fmpcr.2025.146965

MLA

Chorney, Wesley et al. "Investigating the use of a large language model in general practice." Family Medicine & Primary Care Review, vol. 27, no. 1, 2025, pp. 19-24. doi:10.5114/fmpcr.2025.146965.

Vancouver

Chorney W, Ling S. Investigating the use of a large language model in general practice. Family Medicine & Primary Care Review. 2025;27(1):19-24. doi:10.5114/fmpcr.2025.146965.

Background

Large language models have demonstrated strong performance on many tasks. In particular, they have been shown to pass many medical knowledge tests. However, the majority of these studies do not make use of fine tuning.

Objectives

Evaluate the suitability of fine-tuned, large language models in the context of general practice by evaluating performance on the Applied Knowledge Test (AKT) of the Royal College of General Practitioners.

Material and methods

We evaluate the performance of ChatGPT 3.5 in three distinct cases using publicly available practice questions from the Royal College of General Practitioners. In the baseline case, questions are simply input and the answer recorded. In the second case, prompt engineering is used before the questions. Finally, the model is fine-tuned using a subset of the questions and evaluated on the remaining ones.

Results

The fine-tuned model outperforms both the baseline (p = 0.005) and prompt engineering cases (p = 0.010). Furthermore, the model achieves a passing mark on the AKT, with a mean score of 72.03%.

Conclusions

With further development, fine-tuned, large language models could potentially be used by general practitioners to facilitate areas of their practice. Care must be used to ensure that the models conform to stringent standards to avoid misinforming patients or misguiding care.

Keywords

artificial intelligence, general practice, medicine

Abstract

Investigating the use of a large language model in general practice

Background

Objectives

Material and methods

Results

Conclusions

Keywords

Share

Coverage in

Integrated with