Three widely used general-purpose LLMs like GPT, Claude, and Gemini, will be evaluated using standardized text prompts generated from MSLesSeg. Each case will include structured lesion data (volume, count, anatomical distribution across periventricular, juxtacortical, infratentorial, and spinal regions). Models will be tasked with: (1) classifying lesion patterns as typical or atypical for MS, (2) generating structured radiology-style lesion descriptions. Evaluation will include accuracy and F1 scores for classification tasks, and hallucination/error rate analysis. Intra-model consistency across repeated prompts will also be examined.