Clinical question: Does large language model (LLM) assistance improve physician performance on open-ended management reasoning tasks compared to conventional resources?
Background: Management reasoning is a newer field in clinical reasoning that includes decision making around testing, treatment, goals of care, and availability of resources. Unlike in diagnostic reasoning, there is often not a single correct answer in management, which requires prioritization, ongoing monitoring, and communication with the patient. While LLMs have shown effectiveness in diagnostic reasoning, little is known about their performance in management reasoning.
Study design: Prospective, randomized, controlled trial
Setting: Virtual, either remotely or at an in-person computer laboratory
Synopsis: A total of 92 physicians were enrolled from November 2023 to April 2024, the majority of whom were attendings and those with internal medicine training. The physicians completed a total of 400 clinical vignettes—176 using LLMs and 199 using conventional resources (e.g., UpToDate, Google). LLM alone completed 25 cases. An iterative modified Delphi process was used to refine the management rubric to score each case. Physicians using LLM (43.0%) scored higher than those using conventional resources (43.0% compared to 35.7%, 6.5% difference, P <0.001). There was no statistical difference between physicians using LLM and LLM alone (-0.9%, P=0.8). Physicians using the LLM also spent more time per case (801.5 versus 690.2 seconds, 119.2-second difference, P=0.022). Post-hoc sensitivity analysis adjusting for time still showed a 5.4% increase in score (P=0.004).
Bottom line: Use of LLMs may help improve the performance of inpatient management reasoning, a crucial part of every hospitalist’s clinical practice.
Citation: Goh E, et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025. 31;1233–1238. https://doi.org/10.1038/s41591-024-03456-y.
Dr. Wijesekera is a hospitalist at Yale New Haven Hospital and an assistant professor of medicine at Yale School of Medicine, both in New Haven, Conn. Disclosure: Dr. Wijesekera is a consultant on clinical reasoning content for McGraw-Hill and the National Board of Medical Examiners.