Are large language models capable of generating safe and guideline-concordant
rehabilitation recommendations in orthopaedics and sports medicine? A review

Bartosz Gołembiewski; Grzegorz Mulski; Martyna Manicka; Michał Baranowicz; Weronika Mazurkiewicz; Zuzanna Borecka; Agnieszka Sobczak; Alicja Szymczak; Anna Jaworowicz; Joanna Piasecka; Łukasz Łapaj

doi:10.26444/jpccr/222263

Online first

Stats

CC-BY 4.0

Get citation

REVIEW PAPER

Are large language models capable of generating safe and guideline-concordant rehabilitation recommendations in orthopaedics and sports medicine? A review

Bartosz Gołembiewski ^1,2

Grzegorz Mulski ^1,2

Martyna Manicka ^1,2

Michał Baranowicz ^1,2

Weronika Mazurkiewicz ^1,2

Zuzanna Borecka ^1,2

Agnieszka Sobczak ^1,2

Alicja Szymczak ^1,2

Anna Jaworowicz ^1,2

Joanna Piasecka ^1,2

Łukasz Łapaj ¹

More details

Hide details

Department of General Orthopedics, Musculoskeletal Oncology and Trauma Surgery, University of Medical Sciences, Poznań, Poland

Student Scientific Society of Orthopaedics and Musculoskeletal Traumatology, University of Medical Sciences, Poznań, Poland

These authors had equal contribution to this work

Corresponding author

Bartosz Gołembiewski

Department of General Orthopaedics, Musculoskeletal Oncology and Trauma surgery, University of Medical Sciences, Poznań, Poland.

DOI: https://doi.org/10.26444/jpccr/222263

Article (PDF, 373.79 kB)

References (43)

KEYWORDS

large language models

Artificial Intelligence

orthopaedics

rehabilitation

sports medicine

TOPICS

Medicine

ABSTRACT

Introduction and objective:
Large language models (LLMs) are being increasingly used in medicine, including orthopedics and sports medicine, where they may support the development of rehabilitation recommendations and making decisions on return to physical activity. However, concerns remain regarding safety and adherence to clinical guidelines. The aim of the review is to summarize the available evidence on the quality of LLM-generated recommendations, their concordance with current guidelines, and their potential clinical implications.

Review methods:
A narrative review of the literature was conducted using the MEDLINE (PubMed) and Scopus databases, covering the period from November 2022 (after the widespread introduction of models such as ChatGPT) to 1 March 2026. English-language studies evaluating LLM-generated rehabilitation recommendations, their concordance with clinical guidelines, and safety aspects were included.

Brief description of the state of knowledge:
LLMs can generate coherent and often useful rehabilitation recommendations, although their quality is variable. Many studies report only partial concordance with clinical guidelines, with key elements – such as exercise parameters or progression criteria – frequently omitted. Another important limitation is the sensitivity of responses to prompt formulation. While the recommendations are generally reasonable, they often require specialist verification before their use in clinical practice.

Summary:
LLMs may serve as a valuable tool to support rehabilitation, particularly in patient education and treatment planning; however, they should not be considered a standalone source of clinical recommendations. Their use requires specialist oversight and further validation in clinical studies.

REFERENCES (43)

Iqbal U, Tanweer A, Rahmanti AR, Greenfield D, Lee LT-J, Li Y-CJ. Impact of large language model (ChatGPT) in healthcare: an umbrella review and evidence synthesis. J Biomed Sci. 2025;32:45. https://doi.org/10.1186/s12929....

eISSN:	1898-7516
ISSN:	1898-2395