GPT-4 Assistance for Improvement of Physician Performance on Patient Care Tasks: A Randomized Controlled Trial

Clinical question: Does large language model (LLM) assistance improve physician performance on open-ended management reasoning tasks compared to conventional resources?

Background: Management reasoning is a newer field in clinical reasoning that includes decision making around testing, treatment, goals of care, and availability of resources. Unlike in diagnostic reasoning, there is often not a single correct answer in management, which requires prioritization, ongoing monitoring, and communication with the patient. While LLMs have shown effectiveness in diagnostic reasoning, little is known about their performance in management reasoning.

Study design: Prospective, randomized, controlled trial

Setting: Virtual, either remotely or at an in-person computer laboratory

Synopsis: A total of 92 physicians were enrolled from November 2023 to April 2024, the majority of whom were attendings and those with internal medicine training. The physicians completed a total of 400 clinical vignettes—176 using LLMs and 199 using conventional resources (e.g., UpToDate, Google). LLM alone completed 25 cases. An iterative modified Delphi process was used to refine the management rubric to score each case. Physicians using LLM (43.0%) scored higher than those using conventional resources (43.0% compared to 35.7%, 6.5% difference, P <0.001). There was no statistical difference between physicians using LLM and LLM alone (-0.9%, P=0.8). Physicians using the LLM also spent more time per case (801.5 versus 690.2 seconds, 119.2-second difference, P=0.022). Post-hoc sensitivity analysis adjusting for time still showed a 5.4% increase in score (P=0.004).

Bottom line: Use of LLMs may help improve the performance of inpatient management reasoning, a crucial part of every hospitalist’s clinical practice.

Citation: Goh E, et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025. 31;1233–1238. https://doi.org/10.1038/s41591-024-03456-y.

Dr. Wijesekera is a hospitalist at Yale New Haven Hospital and an assistant professor of medicine at Yale School of Medicine, both in New Haven, Conn. Disclosure: Dr. Wijesekera is a consultant on clinical reasoning content for McGraw-Hill and the National Board of Medical Examiners.

AI-Assisted Patient Screening Improves Rate of Eligibility Determination and Enrollment Compared to Traditional Manual Methods
September 3, 2025
CLINICAL QUESTION: This study compares the efficiency of an AI-assisted screening tool versus manual chart reviews for unstructured data in assessing patient eligibility based on specific...
Efficacy of AI models in Detecting Clinical Deterioration
September 3, 2025
Clinical question: Can an artificial intelligence (AI) enabled intervention model reduce the risk of clinical deterioration and subsequent care escalation in hospitalized patients? Background:...

Comparing Screening Tools for Predicting Sepsis Among Children
September 3, 2025
CLINICAL QUESTION: Which screening tool performs best for early prediction of sepsis and septic shock in children per the Phoenix criteria? BACKGROUND: Early recognition of sepsis remains...
Significant Pathology in Young Infants Presenting with Hypothermia: A Multicenter Study
September 3, 2025
CLINICAL QUESTION: Among infants aged 90 days or younger who present to the emergency department (ED) or hospital with hypothermia, what is the prevalence of significant pathology, and what pathology...

In the Literature

GPT-4 Assistance for Improvement of Physician Performance on Patient Care Tasks: A Randomized Controlled Trial

Comment on this Article Cancel reply

Related Articles

AI-Assisted Patient Screening Improves Rate of Eligibility Determination and Enrollment Compared to Traditional Manual Methods

September 3, 2025

Efficacy of AI models in Detecting Clinical Deterioration

September 3, 2025

Comparing Screening Tools for Predicting Sepsis Among Children

September 3, 2025

Significant Pathology in Young Infants Presenting with Hypothermia: A Multicenter Study

September 3, 2025

Share:

Comment on this Article Cancel reply

Related Articles

AI-Assisted Patient Screening Improves Rate of Eligibility Determination and Enrollment Compared to Traditional Manual Methods

September 3, 2025

Efficacy of AI models in Detecting Clinical Deterioration

September 3, 2025

Comparing Screening Tools for Predicting Sepsis Among Children

September 3, 2025

Significant Pathology in Young Infants Presenting with Hypothermia: A Multicenter Study

September 3, 2025