Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan Wermter

Visual Distant Supervision for Scene Graph Generation

Mar 29, 2021
Yuan Yao, Ao Zhang, Xu Han, Mengdi Li, Cornelius Weber, Zhiyuan Liu, Stefan Wermter, Maosong Sun

Figure 1 for Visual Distant Supervision for Scene Graph Generation

Figure 2 for Visual Distant Supervision for Scene Graph Generation

Figure 3 for Visual Distant Supervision for Scene Graph Generation

Figure 4 for Visual Distant Supervision for Scene Graph Generation

Scene graph generation aims to identify objects and their relations in images, providing structured image representations that can facilitate numerous applications in computer vision. However, scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. In this work, we propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. The intuition is that by aligning commonsense knowledge bases and images, we can automatically create large-scale labeled data to provide distant supervision for visual relation learning. To alleviate the noise in distantly labeled data, we further propose a framework that iteratively estimates the probabilistic relation labels and eliminates the noisy ones. Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines. By further incorporating human-labeled data in a semi-supervised fashion, our model outperforms state-of-the-art fully supervised models by a large margin (e.g., 8.6 micro- and 7.6 macro-recall@50 improvements for predicate classification in Visual Genome evaluation). All the data and code will be available to facilitate future research.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

Exercise with Social Robots: Companion or Coach?

Mar 24, 2021
Sascha Griffiths, Tayfun Alpay, Alexander Sutherland, Matthias Kerzel, Manfred Eppe, Erik Strahl, Stefan Wermter

Figure 1 for Exercise with Social Robots: Companion or Coach?

Figure 2 for Exercise with Social Robots: Companion or Coach?

In this paper, we investigate the roles that social robots can take in physical exercise with human partners. In related work, robots or virtual intelligent agents take the role of a coach or instructor whereas in other approaches they are used as motivational aids. These are two "paradigms", so to speak, within the small but growing area of robots for social exercise. We designed an online questionnaire to test whether the preferred role in which people want to see robots would be the companion or the coach. The questionnaire asks people to imagine working out with a robot with the help of three utilized questionnaires: (1) CART-Q which is used for judging coach-athlete relationships, (2) the mind perception questionnaire and (3) the System Usability Scale (SUS). We present the methodology, some preliminary results as well as our intended future work on personal robots for coaching.

* 6 pages, 5 figures, Found in Proceedings of Workshop on Personal Robots for Exercising and Coaching at the HRI 2018 (HRI2018)

Via

Access Paper or Ask Questions

A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition

Mar 23, 2021
Henrique Siqueira, Pablo Barros, Sven Magg, Cornelius Weber, Stefan Wermter

Figure 1 for A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition

Figure 2 for A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition

Figure 3 for A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition

Figure 4 for A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition

In domains where computational resources and labeled data are limited, such as in robotics, deep networks with millions of weights might not be the optimal solution. In this paper, we introduce a connectivity scheme for pyramidal architectures to increase their capacity for learning features. Experiments on facial expression recognition of unseen people demonstrate that our approach is a potential candidate for applications with restricted resources, due to good generalization performance and low computational cost. We show that our approach generalizes as well as convolutional architectures in this task but uses fewer trainable parameters and is more robust for low-resolution faces.

Via

Access Paper or Ask Questions

Disambiguating Affective Stimulus Associations for Robot Perception and Dialogue

Mar 05, 2021
Henrique Siqueira, Alexander Sutherland, Pablo Barros, Mattias Kerzel, Sven Magg, Stefan Wermter

Figure 1 for Disambiguating Affective Stimulus Associations for Robot Perception and Dialogue

Figure 2 for Disambiguating Affective Stimulus Associations for Robot Perception and Dialogue

Figure 3 for Disambiguating Affective Stimulus Associations for Robot Perception and Dialogue

Figure 4 for Disambiguating Affective Stimulus Associations for Robot Perception and Dialogue

Effectively recognising and applying emotions to interactions is a highly desirable trait for social robots. Implicitly understanding how subjects experience different kinds of actions and objects in the world is crucial for natural HRI interactions, with the possibility to perform positive actions and avoid negative actions. In this paper, we utilize the NICO robot's appearance and capabilities to give the NICO the ability to model a coherent affective association between a perceived auditory stimulus and a temporally asynchronous emotion expression. This is done by combining evaluations of emotional valence from vision and language. NICO uses this information to make decisions about when to extend conversations in order to accrue more affective information if the representation of the association is not coherent. Our primary contribution is providing a NICO robot with the ability to learn the affective associations between a perceived auditory stimulus and an emotional expression. NICO is able to do this for both individual subjects and specific stimuli, with the aid of an emotion-driven dialogue system that rectifies emotional expression incoherences. The robot is then able to use this information to determine a subject's enjoyment of perceived auditory stimuli in a real HRI scenario.

Via

Access Paper or Ask Questions

An Ensemble with Shared Representations Based on Convolutional Networks for Continually Learning Facial Expressions

Mar 05, 2021
Henrique Siqueira, Pablo Barros, Sven Magg, Stefan Wermter

Figure 1 for An Ensemble with Shared Representations Based on Convolutional Networks for Continually Learning Facial Expressions

Figure 2 for An Ensemble with Shared Representations Based on Convolutional Networks for Continually Learning Facial Expressions

Figure 3 for An Ensemble with Shared Representations Based on Convolutional Networks for Continually Learning Facial Expressions

Figure 4 for An Ensemble with Shared Representations Based on Convolutional Networks for Continually Learning Facial Expressions

Social robots able to continually learn facial expressions could progressively improve their emotion recognition capability towards people interacting with them. Semi-supervised learning through ensemble predictions is an efficient strategy to leverage the high exposure of unlabelled facial expressions during human-robot interactions. Traditional ensemble-based systems, however, are composed of several independent classifiers leading to a high degree of redundancy, and unnecessary allocation of computational resources. In this paper, we proposed an ensemble based on convolutional networks where the early layers are strong low-level feature extractors, and their representations shared with an ensemble of convolutional branches. This results in a significant drop in redundancy of low-level features processing. Training in a semi-supervised setting, we show that our approach is able to continually learn facial expressions through ensemble predictions using unlabelled samples from different data distributions.

Via

Access Paper or Ask Questions

Continual Learning from Synthetic Data for a Humanoid Exercise Robot

Feb 19, 2021
Nicolas Duczek, Matthias Kerzel, Stefan Wermter

Figure 1 for Continual Learning from Synthetic Data for a Humanoid Exercise Robot

Figure 2 for Continual Learning from Synthetic Data for a Humanoid Exercise Robot

Figure 3 for Continual Learning from Synthetic Data for a Humanoid Exercise Robot

Figure 4 for Continual Learning from Synthetic Data for a Humanoid Exercise Robot

In order to detect and correct physical exercises, a Grow-When-Required Network (GWR) with recurrent connections, episodic memory and a novel subnode mechanism is developed in order to learn spatiotemporal relationships of body movements and poses. Once an exercise is performed, the information of pose and movement per frame is stored in the GWR. For every frame, the current pose and motion pair is compared against a predicted output of the GWR, allowing for feedback not only on the pose but also on the velocity of the motion. In a practical scenario, a physical exercise is performed by an expert like a physiotherapist and then used as a reference for a humanoid robot like Pepper to give feedback on a patient's execution of the same exercise. This approach, however, comes with two challenges. First, the distance from the humanoid robot and the position of the user in the camera's view of the humanoid robot have to be considered by the GWR as well, requiring a robustness against the user's positioning in the field of view of the humanoid robot. Second, since both the pose and motion are dependent on the body measurements of the original performer, the expert's exercise cannot be easily used as a reference. This paper tackles the first challenge by designing an architecture that allows for tolerances in translation and rotations regarding the center of the field of view. For the second challenge, we allow the GWR to grow online on incremental data. For evaluation, we created a novel exercise dataset with virtual avatars called the Virtual-Squat dataset. Overall, we claim that our novel architecture based on the GWR can use a learned exercise reference for different body variations through continual online learning, while preventing catastrophic forgetting, enabling for an engaging long-term human-robot interaction with a humanoid robot.

Via

Access Paper or Ask Questions

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Feb 17, 2021
Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann

Figure 1 for Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Figure 2 for Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Figure 3 for Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Figure 4 for Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to noise presence, especially in low signal-to-noise ratios (SNRs). To increase the robustness of the VAE, we propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs. We evaluate our approach on real recordings of different noisy environments and acoustic conditions using two different noise datasets. We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters. At the same time, we demonstrate that our model is capable of generalizing to unseen noise conditions better than a supervised feedforward deep neural network (DNN). Furthermore, we demonstrate the robustness of the model performance to a reduction of the noisy-clean speech training data size.

* ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Via

Access Paper or Ask Questions

Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

Feb 10, 2021
Julien Scholz, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter

Figure 1 for Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

Figure 2 for Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

Figure 3 for Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

Figure 4 for Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-art performance. Notably, MuZero uses internal state representations derived from real environment states for its predictions. In this paper, we bind the model's predicted internal state representation to the environment state via two additional terms: a reconstruction model loss and a simpler consistency loss, both of which work independently and unsupervised, acting as constraints to stabilize the learning process. Our experiments show that this new integration of reconstruction model loss and simpler consistency loss provide a significant performance increase in OpenAI Gym environments. Our modifications also enable self-supervised pretraining for MuZero, so the algorithm can learn about environment dynamics before a goal is made available.

Via

Access Paper or Ask Questions

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

Feb 05, 2021
Tobias Hinz, Matthew Fisher, Oliver Wang, Eli Shechtman, Stefan Wermter

Figure 1 for CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

Figure 2 for CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

Figure 3 for CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

Figure 4 for CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

We introduce CharacterGAN, a generative model that can be trained on only a few samples (8 - 15) of a given character. Our model generates novel poses based on keypoint locations, which can be modified in real time while providing interactive feedback, allowing for intuitive reposing and animation. Since we only have very limited training samples, one of the key challenges lies in how to address (dis)occlusions, e.g. when a hand moves behind or in front of a body. To address this, we introduce a novel layering approach which explicitly splits the input keypoints into different layers which are processed independently. These layers represent different parts of the character and provide a strong implicit bias that helps to obtain realistic results even with strong (dis)occlusions. To combine the features of individual layers we use an adaptive scaling approach conditioned on all keypoints. Finally, we introduce a mask connectivity constraint to reduce distortion artifacts that occur with extreme out-of-distribution poses at test time. We show that our approach outperforms recent baselines and creates realistic animations for diverse characters. We also show that our model can handle discrete state changes, for example a profile facing left or right, that the different layers do indeed learn features specific for the respective keypoints in those layers, and that our model scales to larger datasets when more data is available.

* Code and supplementary material can be found at https://github.com/tohinz/CharacterGAN

Via

Access Paper or Ask Questions

Hierarchical principles of embodied reinforcement learning: A review

Dec 18, 2020
Manfred Eppe, Christian Gumbsch, Matthias Kerzel, Phuong D. H. Nguyen, Martin V. Butz, Stefan Wermter

Figure 1 for Hierarchical principles of embodied reinforcement learning: A review

Figure 2 for Hierarchical principles of embodied reinforcement learning: A review

Figure 3 for Hierarchical principles of embodied reinforcement learning: A review

Figure 4 for Hierarchical principles of embodied reinforcement learning: A review

Cognitive Psychology and related disciplines have identified several critical mechanisms that enable intelligent biological agents to learn to solve complex problems. There exists pressing evidence that the cognitive mechanisms that enable problem-solving skills in these species build on hierarchical mental representations. Among the most promising computational approaches to provide comparable learning-based problem-solving abilities for artificial agents and robots is hierarchical reinforcement learning. However, so far the existing computational approaches have not been able to equip artificial agents with problem-solving abilities that are comparable to intelligent animals, including human and non-human primates, crows, or octopuses. Here, we first survey the literature in Cognitive Psychology, and related disciplines, and find that many important mental mechanisms involve compositional abstraction, curiosity, and forward models. We then relate these insights with contemporary hierarchical reinforcement learning methods, and identify the key machine intelligence approaches that realise these mechanisms. As our main result, we show that all important cognitive mechanisms have been implemented independently in isolated computational architectures, and there is simply a lack of approaches that integrate them appropriately. We expect our results to guide the development of more sophisticated cognitively inspired hierarchical methods, so that future artificial agents achieve a problem-solving performance on the level of intelligent animals.

Via

Access Paper or Ask Questions