Abstract Details

Title Evaluating the Accuracy of Large Language Models in Stroke Management

Topic Cerebrovascular Disease and Interventional Neurology

Presentation(s) P10 - Poster Session 10 (8:00 AM-9:00 AM)

Poster/Presentation
Number 4-020

Objective We aim to evaluate the accuracy of large language artificial intelligence models (LLM-AI) in acute and inpatient stroke management.

Background Artificial intelligence is currently being investigated as a tool for medical decision making. Imaging analysis AI software is used in stroke analysis (e.g. Rapid AI, Brainomix, VizAI, Aidoc... etc) as an adjunct to medical decision making in acute stroke patients. LLM-AI (ChatGPT, DoximityGPT, OpenEvidence) have been used as part of research studies to help execute medical decisions. However, their accuracy in stroke diagnosis and management has not been systematically assessed.

Design/Methods We provided three different LLM (ChatGPT, DoximityGPT, OpenEvidence) 25 real-life stroke scenarios (5 cases for each of the 5 stroke mechanisms defined by the TOAST criteria). Cases were collected from October 2024 to September 2025 and LLM prompts were collected from 9/20/25-10/1/25 using the most recent LLM models. Criteria like adherence to guidelines, identification of mechanism and treatment decisions were used to evaluate accuracy of LLM-AI in medical decision making. AI responses were compared to decisions provided by two neurologists.

Results Accuracy in identifying stroke mechanism ranged between 60-76%, being highest for large artery atherosclerosis and lowest for embolic stroke of undetermined source. Accurate acute stroke management decision was lower than expected and LLM-AI answers were correct in about 44-68% with OpenEvidence providing the most accurate decision making (68%). Overall adherence to guidelines ranged from 60-74% with ChatGPT achieving the highest overall adherence (74%).

Conclusions LLM-AI demonstrated moderate accuracy in stroke management. Future work should explore integration with clinical workflows as an adjunctive tool.

Authors/Disclosures
Sahil S. Suvarna, DO PRESENTER	Dr. Suvarna has nothing to disclose.
Karim Makhoul, Sr., MD	Dr. Makhoul has nothing to disclose.
Ahmad Shamulzai	Ahmad Shamulzai has nothing to disclose.
Angela Xia, MS	Ms. Xia has nothing to disclose.
Gregory Kurgansky, DO	Gregory Kurgansky, DO has nothing to disclose.
Richard Libman, MD, FAAN (Northwell Health)	Dr. Libman has nothing to disclose.

��ɫ��

��ɫ��