Abstract Details

Title AI LLM Decision Support in Epilepsy Surgery: A Real-world Retrospective Concordance Study Comparing GPT-5 to Expert Case Conference Consensus

Topic Epilepsy/Clinical Neurophysiology (EEG)

Presentation(s) S29 - Epilepsy: Basic Science and Mechanisms (4:54 PM-5:06 PM)

Poster/Presentation
Number 008

Objective

To evaluate the concordance between artificial intelligence (AI) large language model (LLM) recommendations and multidisciplinary epilepsy surgery case conference (EPCC) consensus, and to explore the potential role of LLMs in supporting epilepsy surgery decisions in resource-constrained community settings.

Background Multidisciplinary EPCCs remain the gold standard for presurgical decision-making in drug-resistant epilepsy (DRE). LLMs, capable of integrating multimodal data, may serve as adjunct decision-support tools by synthesizing complex, unstructured clinical information. To date, no study has systematically compared LLM-generated recommendations with EPCC consensus or examined how concordance varies by model type or user expertise.

Design/Methods Standardized case vignettes—including clinical, EEG, imaging, and neuropsychological data—from patients with DRE evaluated at our center were submitted to multiple open-access LLMs. Model outputs were compared with EPCC consensus across four domains: primary recommendation, lateralization, invasive monitoring targets, and ancillary testing. Concordance was scored, and Cohen’s κ was calculated for inter-model and inter-user comparisons.

Results Initial pilot testing (10 cases) compared general-purpose models (GPT-5, Gemini, Meta AI) with a domain-specific model (OpenEvidence). OpenEvidence achieved higher concordance (κ = 0.42 ± 0.22) than the general-purpose models. The focused GPT-5 analysis (30 vignettes) included exploratory inter-user comparisons between epilepsy expert-guided and novice-guided use. GPT-5 achieved identical overall concordance with EPCC consensus (56.7%) for both users, with moderate agreement between expert- and novice-guided GPT outputs (κ = 0.54 ± 0.17, p < 0.01). However, both configurations demonstrated reduced confidence in lateralization, invasive targeting, and ancillary test recommendations.

Conclusions

Open-access LLMs can partially replicate multidisciplinary EPCC decision-making, achieving modest concordance across key domains. While not a substitute for expert consensus, LLMs may provide complementary decision support, particularly in community or resource-limited settings. The minimal inter-user differences observed with GPT-5 suggest practical potential for standardized use without extensive pre-training, warranting larger prospective validation, which is needed to determine clinical utility and safety.

Authors/Disclosures
Yixin Jia PRESENTER	Miss Jia has received personal compensation for serving as an employee of Monell Chemical Senses Center.
Dorris Luong, NP	Ms. Luong has nothing to disclose.
Shahnawaz Karim, MBBS (Kaiser Permenante)	Dr. Karim has received personal compensation for serving as an employee of TPMG.
Ning Zhong, MD (KP Medical Center)	Dr. Zhong has nothing to disclose.

��ɫ��

��ɫ��