Abstract Details

Title Assessment of ChatGPT’s performance on Neurology Written Board Examination Questions

Topic ��ɫ��, Research, and Methodology

Presentation(s) P5 - Poster Session 5 (5:30 PM-6:30 PM)

Poster/Presentation
Number 7-002

Objective To evaluate the performance of ChatGPT in answering neurology board-styled questions

Background Artificial intelligence (AI) models like ChatGPT have gained prominence in various professional fields, including healthcare. To further study the possible utility of this novel tool in a healthcare setting, we evaluated the performance of ChatGPT in answering neurology board-styled questions.

Design/Methods Neurology board-style questions were accessed from Board Vitals, a commercial neurology question bank. ChatGPT (GPT4 via Microsoft Bing Chat) was provided the full question prompt and answer choices. First attempts and additional attempts of up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT’s inability to process visual input. The AI answers were then compared to human user data provided by the question bank to gauge its performance.

Results Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first iteration and 383 (75.3%) over three iterations, translating to approximately the 26^th and 50^th percentile respectively. The highest performing subjects were Pain (100%), Epilepsy & Seizures (85%), and Genetic (82%) while the lowest performing subjects were Imaging/Diagnostic Studies (27%), Critical Care (41%), and Cranial Nerves (48%).

Conclusions

This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with subsequent question iterations and performance was within the expected range of neurology learners. Here we demonstrate ChatGPT’s potential in processing specialized medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making.

Authors/Disclosures
Tse Chiang Chen, MD (Tulane School of Medicine) PRESENTER	Dr. Chen has nothing to disclose.
Evan Multala	No disclosure on file
Patrick Kearns, MD (Tulane University School of Medicine)	No disclosure on file
Arthur Wang, MD (Tulane Center for Clinical Neurosciences)	Dr. Wang has nothing to disclose.

��ɫ����

��ɫ����

��ɫ��

��ɫ��