Abstract Details

Title Automated Detection of Cognitive-Linguistic Impairments from Connected Speech Using Conventional and LLM-Derived Linguistic Features

Topic Aging, Dementia, and Behavioral Neurology

Presentation(s) N1 - Neuroscience in the Clinic: Innovations in Aphasia and Apraxia of Speech (4:10 PM-4:20 PM)

Poster/Presentation
Number 002

Objective

To determine whether computational linguistic features derived from picture description tasks can automatically detect cognitive-linguistic impairments identified by speech-language pathologists.

Background

Aphasia and related cognitive-linguistic impairments are important markers of neurological injury and disease, but automated measures for these remain limited. Advances in natural language processing (NLP), including measures derived from large language models (LLM), enable automated extraction of lexical and semantic features that may provide efficient and consistent markers of cognitive-linguistic impairment.

Design/Methods

We analyzed 1,013 picture description recordings annotated by speech-language pathologists for five impairment types: grammatical errors, semantic errors, nonspecific terms, other cognitive-communication deficits, and word/phrase repetitions. After transcription with CrisperWhisper, we extracted conventional NLP metrics, such as lexical diversity (type–token ratio, unique word count), syntactic complexity (sentence length, part-of-speech distributions), and readability indices, as well as two LLM derived features: surprisal (Gemma 7B) and semantic deviation (all-mpnet-base-v2). Average word surprisal represents how unexpected each word is given its prior context. Semantic deviation is the cosine distance from each transcript's embedding to the average embeddings of healthy controls. To mitigate demographic confounding, one-to-two propensity score matching on age and gender yielded 306 participants (102 cases with at least one feature annotated as present 204 controls). Logistic regression models with leave-one-out cross-validation were used to predict each annotation.

Results Grammatical errors were detected with highest discrimination (AUC=0.93) using surprisal alone, while semantic errors (AUC=0.71) were best captured by surprisal and lexical diversity. Surprisal and semantic deviation were predictive of other cognitive-communication skills (AUC=0.75), whereas semantic deviation alone performed well for nonspecific terms (AUC=0.79). Type-token ratio showed moderate discrimination for word/phrase repetitions (AUC=0.71).

Conclusions

Large language model-derived measures compliment traditional NLP features and can aid automatic detection of diverse cognitive-linguistic impairments. This approach highlights the potential of NLP-based pipelines to enhance clinical screening efficiency and consistency for cognitive-communication disorders.

Authors/Disclosures
Sepideh Jamali Dogahe PRESENTER	Sepideh Jamali Dogahe has nothing to disclose.
Joseph Duffy	The institution of Joseph Duffy has received research support from NIH. Joseph Duffy has received publishing royalties from a publication relating to health care.
Leland Barnard (Mayo Clinic)	Leland Barnard has nothing to disclose.
John L. Stricker (Mayo Clinic)	The institution of an immediate family member of John L. Stricker has received research support from NIH, Mayo Foundation for Medical ��ɫ�� & Research. An immediate family member of John L. Stricker has received intellectual property interests from a discovery or technology relating to health care. John L. Stricker has received intellectual property interests from a discovery or technology relating to health care.
Rene Utianski	Rene Utianski has nothing to disclose.
Hugo Botha, MD (Mayo School of Graduate Medical ��ɫ��, Rochester)	Dr. Botha has received research support from NIH. An immediate family member of Dr. Botha has received personal compensation in the range of $500-$4,999 for serving as a Study Section Member with NIH.

��ɫ����

��ɫ����

��ɫ��

��ɫ��