
Artificial intelligence (AI) is entering hospital medicine at a moment when the field needs it most and when the risks of getting it wrong are high. For many hospitalists, interest in AI has grown out of day-to-day pressures: notes completed late at night, fragmented data streams, and a sense that cognitive work is being squeezed by administrative tasks. In response to these pressures, AI tools are increasingly being explored as ways to support information management and clinical work. But without rigorous research, the real-world effects of AI on clinical reasoning and patient care remain uncertain, and its broader implications for our specialty are unclear.
Our research teams, composed primarily of hospitalists and spanning multiple academic medical centers, have been studying these tools in real clinical and educational settings to answer critical questions: How do AI scribes affect learner development? Can large language models (LLMs) actually improve diagnostic accuracy? What happens to clinical reasoning when algorithms enter the decision-making process? How do we implement these tools safely and at scale? Underlying all of these is a more pragmatic concern: What changes when AI becomes part of the ordinary cognitive environment of inpatient care?
Understanding AI’s Impact on Clinical Reasoning Development
AI tools might be the fastest-adopted health technology in history, and they’ve crept into virtually every part of hospitalist workflows. The most visible integration may be in clinical reasoning itself: OpenEvidence, an LLM-based system that references published medical literature, is now used by more than 40% of U.S. physicians for point-of-care decision support.1 Alongside these reasoning-assist platforms, ambient listening systems that capture clinical encounters and generate notes automatically are rapidly expanding. While implementation of AI scribes on hospital medicine services has lagged behind primary care services, in some institutions these tools are already being piloted on busy teaching services, while in others they are being discussed primarily as a burnout mitigation strategy. While often framed as efficiency solutions, their educational impact has received far less attention than their workflow effects.
A multi-site study now underway examines whether exposure to AI scribes during training influences how residents develop clinical reasoning. One important question raised by this work is whether trainees who spend less time constructing clinical narratives (e.g., problem lists, assessments, and plans) develop those skills differently over time. In daily practice, this is not a hypothetical concern; it shows up when residents struggle to explain why a plan changed or how competing problems were prioritized. At the same time, it is also possible that AI-assisted documentation reduces extraneous cognitive load and allows learners to focus more deliberately on synthesis and decision making. The study will track both possibilities without assuming in advance which effect will dominate.
In parallel, early research collaborations are underway, creating evaluation frameworks for groups considering ambient AI adoption. These toolkits will aim to provide standardized approaches to assess utility, detect harms, and train clinicians to recognize AI-generated errors before they reach patients. Rather than positioning AI as something to embrace or resist, this work reflects a practical question hospital leaders are already asking: “How do we know if this is actually helping?”
Testing AI’s Effect on Diagnostic and Management Decisions
While ambient AI automates documentation, LLMs are also being tested as direct clinical reasoning support, and recent randomized controlled trials have produced some of the first empirical data on how AI assistance influences physician decision-making in diagnostic and management tasks.
The findings reveal both promise and risk. In studies published in JAMA Network Open and Nature Medicine, physicians using AI assistance demonstrated improved diagnostic accuracy on complex cases.2,3 However, the same studies identified automation bias, the tendency to over-rely on algorithmic suggestions, even when incorrect. For clinicians accustomed to making rapid decisions with incomplete information, this creates a familiar but uncomfortable dynamic: confidence without full understanding.
Additional research in JAMA and JAMA Internal Medicine shows that LLMs can estimate diagnostic probabilities and complete structured reasoning tasks at levels approaching physician performance.4,5 What remains less clear is how these outputs are interpreted once they enter real clinical workflows, particularly when time pressure, interruptions, and competing priorities are present.
A Lancet perspective situates modern reasoning-model AI within the lineage of cognitive psychology and script theory, arguing that while LLMs may display “emergent” reasoning, true clinical reasoning remains a human, contextual act.6 Work published in the Journal of Hospital Medicine and NEJM AI emphasizes the need for human benchmarking, transparency about AI limitations, and systems that support, rather than replace, clinical judgment.7,8
For hospital medicine, these studies matter because they address the actual cognitive work hospitalists do: synthesizing fragmented data, managing competing priorities, and making decisions under time pressure. Understanding how AI changes these processes is essential before integrating it into daily practice.
Defining What Physicians Need to Know About AI
Perhaps the most consequential research involves translating frontline findings into educational competencies and implementation standards. Work with the Association of American Medical Colleges and other national organizations is defining the knowledge that future physicians must know to practice safely in AI-augmented environments.
This includes practical skills: recognizing automation bias, auditing AI-generated content for errors, recognizing moments when algorithmic suggestions warrant closer scrutiny, and maintaining clinical reasoning skills even when AI assistance is available. It also includes systems-level competencies: evaluating AI tools before adoption, advocating for transparent implementation, and participating in institutional governance around AI use.
Research on management reasoning, the process hospitalists use to decide what to do next amid uncertainty and competing demands, provides a useful lens for teaching and assessing these skills.9,10 By examining how AI interacts with the mechanics of clinical decision-making, this work contributes shared language to conversations many hospitalists are already having.
What This Means for Hospitalists
The research emerging from hospitalist-led studies offers several areas for consideration:
For clinical practice: AI tools show potential to enhance diagnostic accuracy, while also introducing new cognitive risks. Hospitalists should approach AI assistance thoughtfully with informed skepticism, using it as a reasoning aid while maintaining independent clinical judgment. Institutions implementing AI should provide training on recognizing automation bias and auditing AI outputs.
For education: Ambient AI may affect how trainees develop reasoning skills. Programs adopting these tools may wish to monitor learner outcomes and preserve opportunities for trainees to practice the cognitive work of clinical synthesis. Deploying AI without attention to educational context may have unintended effects on skill development.
For institutional adoption: Hospitals considering AI tools are likely to benefit from rigorous evaluation frameworks before widespread deployment. The evidence-based toolkits now being developed provide methods to assess utility, detect harms, and optimize implementation, moving beyond vendor claims to actual measurement of impact on workflow, safety, and clinical outcomes.
For the specialty: Hospital medicine sits at the intersection of clinical complexity and systems thinking, placing hospitalists in a position to help translate AI capabilities into clinical contexts. As this technology improves, ongoing engagement from hospitalist researchers and frontline clinicians will be important, focusing on how AI changes not just what we do, but how we think.
Moving Forward
The hospitalist-led research described here represents work in progress within a rapidly advancing field. Many critical questions remain unanswered: What is AI’s long-term impact on diagnostic skill maintenance? How do we prevent algorithmic disparities from amplifying existing healthcare inequities? What regulatory frameworks balance AI safety with innovation?
What is becoming clearer is that hospitalists are active participants in how AI technologies are studied, implemented, and evaluated. The cognitive work of hospital medicine—rapid decision-making, synthesis of incomplete data, and coordination across fragmented systems—means that even modest changes in decision support can have meaningful downstream effects.
Hospitalist researchers are building the evidence base. Whether that evidence translates into better care will depend on clinicians who bring not just curiosity about AI, but the judgment to know when it helps and when it doesn’t.
Dr. Jones
Dr. Rodman
Dr. Parsons
Dr. Jones is an associate professor of medicine in the division of hospital medicine at Oregon Health & Science University in Portland, Ore., where he also chairs the OHSU School of Medicine’s entrustment group. He co-chairs national efforts on best practices for implementing ambient AI and helps develop national physician AI competencies. Dr. Rodman is a hospitalist at Beth Israel Deaconess Medical Center, assistant professor at Harvard Medical School, and director of AI programs at the Carl J. Shapiro Center for Education and Research, all in Boston. He is also an associate editor for NEJM AI. Dr. Parsons is an associate professor of medicine at the University of Virginia School of Medicine in Charlottesville, Va., where he also serves as associate dean for clinical competency and director for research and academic advancement in the division of hospital medicine. He is the primary investigator on the Clinical Reasoning Research Collaborative at UVA, site lead for the ARISE research network, evaluating AI through real-world studies, and associate editor for the journal Diagnosis.
References
1. OpenEvidence Collaborates with Microsoft to Expand AI. Openevidence.com website. https://www.openevidence.com/announcements/openevidence-collaborates-with-microsoft-to-expand-ai-leadership-in-healthcare-bringing-clinical-evidence-and-guidelines-to-enterprise-clinician-workflows. Published October 16, 2025. Accessed January 10, 2026.
2. Goh E, et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Netw Open. 2024;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
3. Goh E, et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025;31(4):1233-1238. doi: 10.1038/s41591-024-03456-y.
4. Kanjee Z, et al. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330(1):78- 80. doi: 10.1001/jama.2023.8288.
5. Cabral S, et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA Intern Med. 2024;184(5):581-583. doi: 10.1001/jamainternmed.2024.0295.
6. Rodman A, Topol EJ. Is generative artificial intelligence capable of clinical reasoning? Lancet. 2025;405(10480):689. doi: 10.1016/S0140-6736(25)00348-4.
7. Rodman A, Kanjee Z. The promise and peril of generative artificial intelligence for daily hospitalist practice. J Hosp Med. 2024;19(12):1188-1193. doi: 10.1002/jhm.13363.
8. McCoy Liam, et al. Assessment of large language models in clinical reasoning: a novel benchmarking study. NEJM AI. 2025;2(10); doi: 10.1056/Aldbp2500120.
9. Parsons AS, et al. How postgraduate medical trainees conceptualise management reasoning: A qualitative study. Med Educ. 2025. doi: 10.1111/medu.70123.
10. Dreicer JJ, et al. The diagnostic medical interview. Med Clin North Am. 2022;106(4):601-614. doi: 10.1016/j.mcna.2022.01.005