Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Giuseppe Riccardi

Let's Give a Voice to Conversational Agents in Virtual Reality

Aug 04, 2023

Michele Yin, Gabriel Roccabruna, Abhinav Azad, Giuseppe Riccardi

Figure 1 for Let's Give a Voice to Conversational Agents in Virtual Reality

Figure 2 for Let's Give a Voice to Conversational Agents in Virtual Reality

Abstract:The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and adding custom or cloud-based Speech-To-Text and Text-To-Speech models to make the interaction voice-based. Using this architecture, we present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.

Via

Access Paper or Ask Questions

Understanding Emotion Valence is a Joint Deep Learning Task

May 27, 2023

Gabriel Roccabruna, Seyed Mahed Mousavi, Giuseppe Riccardi

Figure 1 for Understanding Emotion Valence is a Joint Deep Learning Task

Figure 2 for Understanding Emotion Valence is a Joint Deep Learning Task

Figure 3 for Understanding Emotion Valence is a Joint Deep Learning Task

Figure 4 for Understanding Emotion Valence is a Joint Deep Learning Task

Abstract:The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task, two-step, and joint settings for the valence and EC prediction tasks. We compare and evaluate the performance of generative (GPT-2) and discriminative (BERT) architectures in each setting. We observed that providing the ground truth label of one task improves the prediction performance of the models in the other task. We further observed that the discriminative model achieves the best trade-off of valence and EC prediction tasks in the joint prediction setting. As a result, we attain a single model that performs both tasks, thus, saving computation resources at training and inference times.

Via

Access Paper or Ask Questions

Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?

May 25, 2023

Seyed Mahed Mousavi, Simone Caldarella, Giuseppe Riccardi

Abstract:Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user.

Via

Access Paper or Ask Questions

Whats New? Identifying the Unfolding of New Events in Narratives

Feb 20, 2023

Seyed Mahed Mousavi, Shohei Tanaka, Gabriel Roccabruna, Koichiro Yoshino, Satoshi Nakamura, Giuseppe Riccardi

Figure 1 for Whats New? Identifying the Unfolding of New Events in Narratives

Figure 2 for Whats New? Identifying the Unfolding of New Events in Narratives

Figure 3 for Whats New? Identifying the Unfolding of New Events in Narratives

Figure 4 for Whats New? Identifying the Unfolding of New Events in Narratives

Abstract:Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events may provide a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and a study aiming at validating the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.

Via

Access Paper or Ask Questions

What can Speech and Language Tell us About the Working Alliance in Psychotherapy

Jun 27, 2022

Sebastian P. Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi

Figure 1 for What can Speech and Language Tell us About the Working Alliance in Psychotherapy

Figure 2 for What can Speech and Language Tell us About the Working Alliance in Psychotherapy

Abstract:We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Alliance Inventory Observer-rated Shortened - a 12 items inventory covering task, goal, and relationship - which has a relevant influence on therapeutic outcomes. In this work, we investigate the relation between this alliance inventory and the spoken conversations (sessions) between the patient and the psychotherapist. We have delivered eight weeks of e-therapy, collected their audio and video call sessions, and manually transcribed them. The spoken conversations have been annotated and evaluated with WAI ratings by professional therapists. We have investigated speech and language features and their association with WAI items. The feature types include turn dynamics, lexical entrainment, and conversational descriptors extracted from the speech and language signals. Our findings provide strong evidence that a subset of these features are strong indicators of working alliance. To the best of our knowledge, this is the first and a novel study to exploit speech and language for characterising working alliance.

* Accepted at Interspeech 2022

Via

Access Paper or Ask Questions

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Dec 13, 2021

Sebastian P. Bayerl, Aniruddha Tammewar, Korbinian Riedhammer, Giuseppe Riccardi

Figure 1 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Figure 2 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Figure 3 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Figure 4 for Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Abstract:Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of ECs in spoken narratives. For the acoustic word-level representations, we use Residual Neural Networks (ResNet) pretrained on separate speech emotion corpora and fine-tuned to detect EC. Experiments with different fusion and system combination strategies show that late fusion leads to significant improvements for this task.

* Accepted at ASRU 2021 https://asru2021.org/

Via

Access Paper or Ask Questions

Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Nov 25, 2021

Juan Manuel Mayor-Torres, Sara Medina-DeVilliers, Tessa Clarkson, Matthew D. Lerner, Giuseppe Riccardi

Figure 1 for Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Figure 2 for Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Figure 3 for Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Figure 4 for Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Abstract:Current models on Explainable Artificial Intelligence (XAI) have shown an evident and quantified lack of reliability for measuring feature-relevance when statistically entangled features are proposed for training deep classifiers. There has been an increase in the application of Deep Learning in clinical trials to predict early diagnosis of neuro-developmental disorders, such as Autism Spectrum Disorder (ASD). However, the inclusion of more reliable saliency-maps to obtain more trustworthy and interpretable metrics using neural activity features is still insufficiently mature for practical applications in diagnostics or clinical trials. Moreover, in ASD research the inclusion of deep classifiers that use neural measures to predict viewed facial emotions is relatively unexplored. Therefore, in this study we propose the evaluation of a Convolutional Neural Network (CNN) for electroencephalography (EEG)-based facial emotion recognition decoding complemented with a novel RemOve-And-Retrain (ROAR) methodology to recover highly relevant features used in the classifier. Specifically, we compare well-known relevance maps such as Layer-Wise Relevance Propagation (LRP), PatternNet, Pattern Attribution, and Smooth-Grad Squared. This study is the first to consolidate a more transparent feature-relevance calculation for a successful EEG-based facial emotion recognition using a within-subject-trained CNN in typically-developed and ASD individuals.

Via

Access Paper or Ask Questions

Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

Jul 18, 2021

Juan Manuel Mayor-Torres, Mirco Ravanelli, Sara E. Medina-DeVilliers, Matthew D. Lerner, Giuseppe Riccardi

Figure 1 for Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

Figure 2 for Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

Figure 3 for Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

Figure 4 for Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

Abstract:Machine learning methods, such as deep learning, show promising results in the medical domain. However, the lack of interpretability of these algorithms may hinder their applicability to medical decision support systems. This paper studies an interpretable deep learning technique, called SincNet. SincNet is a convolutional neural network that efficiently learns customized band-pass filters through trainable sinc-functions. In this study, we use SincNet to analyze the neural activity of individuals with Autism Spectrum Disorder (ASD), who experience characteristic differences in neural oscillatory activity. In particular, we propose a novel SincNet-based neural network for detecting emotions in ASD patients using EEG signals. The learned filters can be easily inspected to detect which part of the EEG spectrum is used for predicting emotions. We found that our system automatically learns the high-$\alpha$ (9-13 Hz) and $\beta$ (13-30 Hz) band suppression often present in individuals with ASD. This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD. The improved interpretability of SincNet is achieved without sacrificing performance in emotion recognition.

Via

Access Paper or Ask Questions

Emotion Carrier Recognition from Personal Narratives

Aug 17, 2020

Aniruddha Tammewar, Alessandra Cervone, Giuseppe Riccardi

Figure 1 for Emotion Carrier Recognition from Personal Narratives

Figure 2 for Emotion Carrier Recognition from Personal Narratives

Figure 3 for Emotion Carrier Recognition from Personal Narratives

Figure 4 for Emotion Carrier Recognition from Personal Narratives

Abstract:Personal Narratives (PN) - recollections of facts, events, and thoughts from one's own experience - are often used in everyday conversations. So far, PNs have mainly been explored for tasks such as valence prediction or emotion classification (i.e. happy, sad). However, these tasks might overlook more fine-grained information that could nevertheless prove relevant for understanding PNs. In this work, we propose a novel task for Narrative Understanding: Emotion Carrier Recognition (ECR). We argue that automatic recognition of emotion carriers, the text fragments that carry the emotions of the narrator (i.e. 'loss of a grandpa', 'high school reunion'), from PNs, provides a deeper level of emotion analysis needed, for instance, in the mental healthcare domain. In this work, we explore the task of ECR using a corpus of PNs manually annotated with emotion carriers and investigate different baseline models for the task. Furthermore, we propose several evaluation strategies for the task. Based on the inter-annotator agreement, the task in itself was found to be complex and subjective for humans. Nevertheless, we discuss evaluation metrics that could be suitable for applications based on ECR.

Via

Access Paper or Ask Questions

Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Jun 17, 2020

Alessandra Cervone, Giuseppe Riccardi

Figure 1 for Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Figure 2 for Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Figure 3 for Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Figure 4 for Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Abstract:In this work, we investigate the human perception of coherence in open-domain dialogues. In particular, we address the problem of annotating and modeling the coherence of next-turn candidates while considering the entire history of the dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings, where next-turn candidate utterances ratings are provided considering the full dialogue context. Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities previously introduced and the Dialogue Acts used. Second, we experiment with different architectures to model entities, Dialogue Acts and their combination and evaluate their performance in predicting human coherence ratings on SWBD-Coh. We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating.

* Accepted at SIGDIAL 2020

Via

Access Paper or Ask Questions