Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sven Magg

A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks

May 16, 2018

Chandrakant Bothe, Cornelius Weber, Sven Magg, Stefan Wermter

Figure 1 for A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks

Figure 2 for A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks

Figure 3 for A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks

Figure 4 for A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks

Abstract:Dialogue act recognition is an important part of natural language understanding. We investigate the way dialogue act corpora are annotated and the learning approaches used so far. We find that the dialogue act is context-sensitive within the conversation for most of the classes. Nevertheless, previous models of dialogue act classification work on the utterance-level and only very few consider context. We propose a novel context-based learning method to classify dialogue acts using a character-level language model utterance representation, and we notice significant improvement. We evaluate this method on the Switchboard Dialogue Act corpus, and our results show that the consideration of the preceding utterances as a context of the current utterance improves dialogue act detection.

* Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Via

Access Paper or Ask Questions

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Apr 06, 2018

Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

Figure 1 for On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Figure 2 for On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Figure 3 for On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Figure 4 for On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Abstract:Speech emotion recognition (SER) is an important aspect of effective human-robot collaboration and received a lot of attention from the research community. For example, many neural network-based architectures were proposed recently and pushed the performance to a new level. However, the applicability of such neural SER models trained only on in-domain data to noisy conditions is currently under-researched. In this work, we evaluate the robustness of state-of-the-art neural acoustic emotion recognition models in human-robot interaction scenarios. We hypothesize that a robot's ego noise, room conditions, and various acoustic events that can occur in a home environment can significantly affect the performance of a model. We conduct several experiments on the iCub robot platform and propose several novel ways to reduce the gap between the model's performance during training and testing in real-world conditions. Furthermore, we observe large improvements in the model performance on the robot and demonstrate the necessity of introducing several data augmentation techniques like overlaying background noise and loudness variations to improve the robustness of the neural approaches.

* Submitted to IROS'18, Madrid, Spain

Via

Access Paper or Ask Questions

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Apr 03, 2018

Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

Figure 1 for EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Figure 2 for EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Figure 3 for EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Figure 4 for EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Abstract:Acoustically expressed emotions can make communication with a robot more efficient. Detecting emotions like anger could provide a clue for the robot indicating unsafe/undesired situations. Recently, several deep neural network-based models have been proposed which establish new state-of-the-art results in affective state evaluation. These models typically start processing at the end of each utterance, which not only requires a mechanism to detect the end of an utterance but also makes it difficult to use them in a real-time communication scenario, e.g. human-robot interaction. We propose the EmoRL model that triggers an emotion classification as soon as it gains enough confidence while listening to a person speaking. As a result, we minimize the need for segmenting the audio signal for classification and achieve lower latency as the audio signal is processed incrementally. The method is competitive with the accuracy of a strong baseline model, while allowing much earlier prediction.

* Accepted to the IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane, Australia, May 21-25, 2018

Via

Access Paper or Ask Questions

Reusing Neural Speech Representations for Auditory Emotion Recognition

Mar 30, 2018

Egor Lakomkin, Cornelius Weber, Sven Magg, Stefan Wermter

Figure 1 for Reusing Neural Speech Representations for Auditory Emotion Recognition

Figure 2 for Reusing Neural Speech Representations for Auditory Emotion Recognition

Figure 3 for Reusing Neural Speech Representations for Auditory Emotion Recognition

Figure 4 for Reusing Neural Speech Representations for Auditory Emotion Recognition

Abstract:Acoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models. The difficulties come from the scarcity of training data, general subjectivity in emotion perception resulting in low annotator agreement, and the uncertainty about which features are the most relevant and robust ones for classification. In this paper, we will tackle the latter problem. Inspired by the recent success of transfer learning methods we propose a set of architectures which utilize neural representations inferred by training on large speech databases for the acoustic emotion recognition task. Our experiments on the IEMOCAP dataset show ~10% relative improvements in the accuracy and F1-score over the baseline recurrent neural network which is trained end-to-end for emotion recognition.

Via

Access Paper or Ask Questions