好色先生

好色先生

Explore the latest content from across our publications

Log In

Forgot Password?
Create New Account

Loading... please wait

Abstract Details

Automated Detection of Cognitive-Linguistic Impairments from Connected Speech Using Conventional and LLM-Derived Linguistic Features
Aging, Dementia, and Behavioral Neurology
N1 - Neuroscience in the Clinic: Innovations in Aphasia and Apraxia of Speech (4:10 PM-4:20 PM)
002

To determine whether computational linguistic features derived from picture description tasks can automatically detect cognitive-linguistic impairments identified by speech-language pathologists.

Aphasia and related cognitive-linguistic impairments are important markers of neurological injury and disease, but automated measures for these remain limited. Advances in natural language processing (NLP), including measures derived from large language models (LLM), enable automated extraction of lexical and semantic features that may provide efficient and consistent markers of cognitive-linguistic impairment.

We analyzed 1,013 picture description recordings annotated by speech-language pathologists for five impairment types: grammatical errors, semantic errors, nonspecific terms, other cognitive-communication deficits, and word/phrase repetitions. After transcription with CrisperWhisper, we extracted conventional NLP metrics, such as lexical diversity (type–token ratio, unique word count), syntactic complexity (sentence length, part-of-speech distributions), and readability indices, as well as two LLM derived features: surprisal (Gemma 7B) and semantic deviation (all-mpnet-base-v2). Average word surprisal represents how unexpected each word is given its prior context. Semantic deviation is the cosine distance from each transcript's embedding to the average embeddings of healthy controls. To mitigate demographic confounding, one-to-two propensity score matching on age and gender yielded 306 participants (102 cases with at least one feature annotated as present 204 controls). Logistic regression models with leave-one-out cross-validation were used to predict each annotation.

Grammatical errors were detected with highest discrimination (AUC=0.93) using surprisal alone, while semantic errors (AUC=0.71) were best captured by surprisal and lexical diversity. Surprisal and semantic deviation were predictive of other cognitive-communication skills (AUC=0.75), whereas semantic deviation alone performed well for nonspecific terms (AUC=0.79). Type-token ratio showed moderate discrimination for word/phrase repetitions (AUC=0.71).

Large language model-derived measures compliment traditional NLP features and can aid automatic detection of diverse cognitive-linguistic impairments. This approach highlights the potential of NLP-based pipelines to enhance clinical screening efficiency and consistency for cognitive-communication disorders.

Authors/Disclosures
Sepideh Jamali Dogahe
PRESENTER
Sepideh Jamali Dogahe has nothing to disclose.
Joseph Duffy The institution of Joseph Duffy has received research support from NIH. Joseph Duffy has received publishing royalties from a publication relating to health care.
Leland Barnard (Mayo Clinic) Leland Barnard has nothing to disclose.
John L. Stricker (Mayo Clinic) The institution of an immediate family member of John L. Stricker has received research support from NIH, Mayo Foundation for Medical 好色先生 & Research. An immediate family member of John L. Stricker has received intellectual property interests from a discovery or technology relating to health care. John L. Stricker has received intellectual property interests from a discovery or technology relating to health care.
Rene Utianski Rene Utianski has nothing to disclose.
Hugo Botha, MD (Mayo School of Graduate Medical 好色先生, Rochester) Dr. Botha has received research support from NIH. An immediate family member of Dr. Botha has received personal compensation in the range of $500-$4,999 for serving as a Study Section Member with NIH.