REVIEW PAPER
Are large language models capable of generating safe and guideline-concordant rehabilitation recommendations in orthopaedics and sports medicine? A review
 
More details
Hide details
1
Department of General Orthopedics, Musculoskeletal Oncology and Trauma Surgery, University of Medical Sciences, Poznań, Poland
 
2
Student Scientific Society of Orthopaedics and Musculoskeletal Traumatology, University of Medical Sciences, Poznań, Poland
 
These authors had equal contribution to this work
 
 
Corresponding author
Bartosz Gołembiewski   

Department of General Orthopaedics, Musculoskeletal Oncology and Trauma surgery, University of Medical Sciences, Poznań, Poland.
 
 
 
KEYWORDS
TOPICS
ABSTRACT
Introduction and objective:
Large language models (LLMs) are being increasingly used in medicine, including orthopedics and sports medicine, where they may support the development of rehabilitation recommendations and making decisions on return to physical activity. However, concerns remain regarding safety and adherence to clinical guidelines. The aim of the review is to summarize the available evidence on the quality of LLM-generated recommendations, their concordance with current guidelines, and their potential clinical implications.

Review methods:
A narrative review of the literature was conducted using the MEDLINE (PubMed) and Scopus databases, covering the period from November 2022 (after the widespread introduction of models such as ChatGPT) to 1 March 2026. English-language studies evaluating LLM-generated rehabilitation recommendations, their concordance with clinical guidelines, and safety aspects were included.

Brief description of the state of knowledge:
LLMs can generate coherent and often useful rehabilitation recommendations, although their quality is variable. Many studies report only partial concordance with clinical guidelines, with key elements – such as exercise parameters or progression criteria – frequently omitted. Another important limitation is the sensitivity of responses to prompt formulation. While the recommendations are generally reasonable, they often require specialist verification before their use in clinical practice.

Summary:
LLMs may serve as a valuable tool to support rehabilitation, particularly in patient education and treatment planning; however, they should not be considered a standalone source of clinical recommendations. Their use requires specialist oversight and further validation in clinical studies.
REFERENCES (43)
1.
Iqbal U, Tanweer A, Rahmanti AR, Greenfield D, Lee LT-J, Li Y-CJ. Impact of large language model (ChatGPT) in healthcare: an umbrella review and evidence synthesis. J Biomed Sci. 2025;32:45. https://doi.org/10.1186/s12929....
 
2.
GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. London, England. 2020;396:1204–22. https://doi.org/10.1016/S0140-....
 
3.
Pop T, Szczygielska D, Druzbicki M. Epidemiology and cost of conservative treatment of patients with degenerative joint disease of the hip and knee. Ortop Traumatol Rehabil. 2007;9:405–12.
 
4.
Grindem H, Snyder-Mackler L, Moksnes H, Engebretsen L, Risberg MA. Simple decision rules can reduce reinjury risk by 84% after ACL reconstruction: the Delaware-Oslo ACL cohort study. Br J Sports Med. 2016;50:804–8. https://doi.org/10.1136/bjspor....
 
5.
Kyritsis P, Bahr R, Landreau P, Miladi R, Witvrouw E. Likelihood of ACL graft rupture: not meeting six clinical discharge criteria before return to sport is associated with a four times greater risk of rupture. Br J Sports Med. 2016;50:946–51. https://doi.org/10.1136/bjspor....
 
6.
Ardern CL, Glasgow P, Schneiders A, Witvrouw E, Clarsen B, Cools A, et al. 2016 Consensus statement on return to sport from the First World Congress in Sports Physical Therapy, Bern. Br J Sports Med. 2016;50:853–64. https://doi.org/10.1136/bjspor....
 
7.
Chrzan D, Kusz D, Bołtuć W, Bryła A, Kusz B. Subjective assessment of rehabilitation protocol by patients after ACL reconstruction – preliminary report. Ortop Traumatol Rehabil. 2013;15:215–25. https://doi.org/10.5604/150934....
 
8.
Zhang L, Tashiro S, Mukaino M, Yamada S. Use of artificial intelligence large language models as a clinical tool in rehabilitation medicine: a comparative test case. J Rehabil Med. 2023;55:jrm13373. https://doi.org/10.2340/jrm.v5....
 
9.
Gürses ÖA, Özüdoğru A, Tuncay F, Kararti C. The Role of Artificial Intelligence Large Language Models in Personalized Rehabilitation Programs for Knee Osteoarthritis: An Observational Study. J Med Syst. 2025;49:73. https://doi.org/10.1007/s10916....
 
10.
Yang Z, Zhang X, Li H, Ye J. More details, less variability? A crossover design study on the impact of information granularity on ChatGPT’s training program stability. Biol Sport. Termedia. 2025;43:379–92. https://doi.org/10.5114/biolsp....
 
11.
Mo K, Lin R, Dunn E, Girgis G, Fang W, Walsh J, et al. Systematic Review on Large Language Models in Orthopaedic Surgery. J Clin Med. Multidisciplinary Digital Publishing Institute. 2025;14:5876. https://doi.org/10.3390/jcm141....
 
12.
Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ. British Medical Journal Publishing Group. 2020;370:m3164. https://doi.org/10.1136/bmj.m3....
 
13.
Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT’s responses to and references for clinical questions in physical therapy. J Phys Ther Sci. 2024;36:234–9. https://doi.org/10.1589/jpts.3....
 
14.
Mykhalko Y, Dyditska S, Balatska L, Filak F, Rubtsova Y. AI-driven rehabilitation: evaluation of ChatGPT-4o for generating personalized physical rehabilitation plans in comorbid patients. Wiad Lek. Warsaw, Poland. 1960; 2025;78:753–9. https://doi.org/10.36740/WLek/....
 
15.
Safran E, Yaşasın Y. AI vs AI: clinical reasoning performance of language models in orthopedic rehabilitation. J Health Sci Med. MediHealth Academy Yayıncılık. 2025;8:825–31. https://doi.org/10.32322/jhsm.....
 
16.
Safran E, Yildirim S. A cross-sectional study on ChatGPT’s alignment with clinical practice guidelines in musculoskeletal rehabilitation. BMC Musculoskelet Disord. 2025;26:411. https://doi.org/10.1186/s12891....
 
17.
Kim J. Comparing ChatGPT and DeepSeek for Generating Clinically Relevant Responses related to Physical Therapy. J Musculoskelet Sci Technol. Academy of KEMA. 2025;9:9–18. https://doi.org/10.29273/jmst.....
 
18.
Gianola S, Bargeri S, Castellini G, Cook C, Palese A, Pillastrini P, et al. Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study. J Orthop Sports Phys Ther. 2024;54:222–8. https://doi.org/10.2519/jospt.....
 
19.
Hao J, Yao Z, Tang Y, Remis A, Wu K, Yu X. Artificial Intelligence in Physical Therapy: Evaluating ChatGPT’s Role in Clinical Decision Support for Musculoskeletal Care. Ann Biomed Eng. 2025;53:9–13. https://doi.org/10.1007/s10439....
 
20.
Lin M-J, Hsieh L-C, Chen C-K. Evaluating ChatGPT’s Concordance with Clinical Guidelines of Ménière’s Disease in Chinese. Diagnostics. Multidisciplinary Digital Publishing Institute. 2025;15:2006. https:// doi.org/10.3390/diagnostics15162006.
 
21.
Mitchell O, Ward P, Petrov K. Weight-Bearing Status After Peri-Prosthetic Proximal Femur Fracture Open Reduction and Internal Fixation (ORIF) or Revision Arthroplasty: A Clinical Audit. Cureus. 2025;17:e90805. https://doi.org/10.7759/cureus....
 
22.
Allen NE, Schwarzel AK, Canning CG. Recurrent falls in Parkinson’s disease: a systematic review. Park Dis. 2013;2013:906274. https://doi.org/10.1155/2013/9....
 
23.
Bloem BR, Hausdorff JM, Visser JE, Giladi N. Falls and freezing of gait in Parkinson’s disease: a review of two interconnected, episodic phenomena. Mov Disord Off J Mov Disord Soc. 2004;19:871–84. https:// doi.org/10.1002/mds.20115.
 
24.
Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J Med Syst. 2023;47:33. https://doi.org/10.1007/s10916....
 
25.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. https://doi.org/10.1371/journa....
 
26.
Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5:e107–8. https://doi.org/10.1016/S2589-....
 
27.
Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230–43. https://doi.org/10.1136/svn-20....
 
28.
Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. 2023;6:120. https://doi.org/10.1038/s41746....
 
29.
Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on Medical Challenge Problems. arXiv. 2023: n. pag. https://doi.org/10.48550/arXiv....
 
30.
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. Nature Publishing Group. 2023;620:172–80. https://doi.org/10.1038/s41586....
 
31.
Gaber F, Shaik M, Allega F, Bilecz AJ, Busch F, Goon K, et al. Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis. Npj Digit Med. Nature Publishing Group. 2025;8:263. https://doi.org/10.1038/s41746....
 
32.
Han F, Huang X, Wang X, Chen Y, Lu C, Li S, et al. Artificial Intelligence in Orthopedic Surgery: Current Applications, Challenges, and Future Directions. MedComm. 2025;6:e70260. https://doi.org/10.1002/ mco2.70260.
 
33.
Zhang C, Liu S, Zhou X, Zhou S, Tian Y, Wang S, et al. Examining the Role of Large Language Models in Orthopedics: Systematic Review. J Med Internet Res. JMIR Publications Inc., Toronto, Canada. 2024;26:e59607. https://doi.org/10.2196/59607.
 
34.
Blease C, Bernstein MH, Gaab J, Kaptchuk TJ, Kossowsky J, Mandl KD, et al. Computerization and the future of primary care: A survey of general practitioners in the UK. PloS One. 2018;13:e0207418. https:// doi.org/10.1371/journal.pone.0207418.
 
35.
Naqvi WM, Shaikh SZ, Mishra GV. Large language models in physical therapy: time to adapt and adept. Front Public Health. Frontiers. 2024;12:1364660. https://doi.org/10.3389/fpubh.....
 
36.
Asgari E, Montaña-Brown N, Dubois M, Khalil S, Balloch J, Yeung JA, et al. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. Npj Digit Med. Nature Publishing Group. 2025;8:274. https://doi.org/10.1038/s41746....
 
37.
Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare. Multidisciplinary Digital Publishing Institute. 2023;11:887. https://doi.org/10.3390/health... (access: 2025.03.12).
 
38.
Morley J, Machado CCV, Burr C, Cowls J, Joshi I, Taddeo M, et al. The ethics of AI in health care: A mapping review. Soc Sci Med. 1982. 2020;260:113172. https://doi.org/10.1016/j.socs....
 
39.
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. Nature Publishing Group. 2019;25:44–56. https://doi.org/10.1038/s41591....
 
40.
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. Nature Publishing Group. 2019;1:206–215. https://doi.org/10.1038/ s42256-019-0048-x.
 
41.
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17:195. https://doi.org/10.1186/s12916....
 
42.
European Commission, Directorate-General for Communications Networks, Content and Technology, High-Level Expert Group on Artificial Intelligence. Ethics guidelines for trustworthy AI. Publications Office. 2019. https://data.europa.eu/doi/10.... (access: 2025.12.12).
 
43.
World Health Organization 2021. Ethics and governance of artificial intelligence for health. https://www.who.int/publicatio... item/9789240029200. (access: 2025.12.12).
 
eISSN:1898-7516
ISSN:1898-2395
Journals System - logo
Scroll to top