Alert button
Picture for Julian Hough

Julian Hough

Alert button

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Jun 29, 2021
Morteza Rohanian, Julian Hough, Matthew Purver

Figure 1 for Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
Figure 2 for Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
Figure 3 for Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.

* INTERSPEECH 2021. arXiv admin note: substantial text overlap with arXiv:2106.09668 
Viaarxiv icon

Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

Jun 17, 2021
Morteza Rohanian, Julian Hough, Matthew Purver

Figure 1 for Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech
Figure 2 for Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech
Figure 3 for Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech
Figure 4 for Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and the mini-mental state examination (MMSE) score prediction. We proposed a model that obtains unimodal decisions from different LSTMs, one for each modality of text and audio, and then combines them using a gating mechanism for the final prediction. We focused on sequential modelling of text and audio and investigated whether the disfluencies present in individuals' speech relate to the extent of their cognitive impairment. Our results show that the proposed classification and regression schemes obtain very promising results on both development and test sets. This suggests Alzheimer's Disease can be detected successfully with sequence modeling of the speech data of medical sessions.

* Proc. Interspeech 2020, 2187-2191  
Viaarxiv icon

Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning

Nov 13, 2020
Morteza Rohanian, Julian Hough

Figure 1 for Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Figure 2 for Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Figure 3 for Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Figure 4 for Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning

We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model with four tasks of disfluency detection, language modelling, part-of-speech tagging, and utterance segmentation in a simple deep recurrent setting. We show that these tasks provide positive inductive biases to each other with the optimal contribution of each one relying on the severity of the noise from the task. Our live multi-task model outperforms similar individual tasks, delivers competitive performance, and is beneficial for future use in conversational agents in psychiatric treatment.

* The 28th International Conference on Computational Linguistics (COLING 2020)  
Viaarxiv icon

Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics

Nov 01, 2018
Mehrnoosh Sadrzadeh, Matthew Purver, Julian Hough, Ruth Kempson

Figure 1 for Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics
Figure 2 for Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics
Figure 3 for Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics

One of the fundamental requirements for models of semantic processing in dialogue is incrementality: a model must reflect how people interpret and generate language at least on a word-by-word basis, and handle phenomena such as fragments, incomplete and jointly-produced utterances. We show that the incremental word-by-word parsing process of Dynamic Syntax (DS) can be assigned a compositional distributional semantics, with the composition operator of DS corresponding to the general operation of tensor contraction from multilinear algebra. We provide abstract semantic decorations for the nodes of DS trees, in terms of vectors, tensors, and sums thereof; using the latter to model the underspecified elements crucial to assigning partial representations during incremental processing. As a working example, we give an instantiation of this theory using plausibility tensors of compositional distributional semantics, and show how our framework can incrementally assign a semantic plausibility measure as it parses phrases and sentences.

* accepted in SemDial 2018: https://semdial.hypotheses.org/program/accepted-papers 
Viaarxiv icon

Strongly Incremental Repair Detection

Aug 29, 2014
Julian Hough, Matthew Purver

Figure 1 for Strongly Incremental Repair Detection
Figure 2 for Strongly Incremental Repair Detection
Figure 3 for Strongly Incremental Repair Detection
Figure 4 for Strongly Incremental Repair Detection

We present STIR (STrongly Incremental Repair detection), a system that detects speech repairs and edit terms on transcripts incrementally with minimal latency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. Results on the Switchboard disfluency tagged corpus show utterance-final accuracy on a par with state-of-the-art incremental repair detection methods, but with better incremental accuracy, faster time-to-detection and less computational overhead. We evaluate its performance using incremental metrics and propose new repair processing evaluation standards.

* 12 pages, 6 figures, EMNLP conference long paper 2014 
Viaarxiv icon