The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.
Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.
Human motion plays an important role in many fields. Large databases exist that store and make available recordings of human motions. However, annotating each motion with multiple labels is a cumbersome and error-prone process. This bachelor's thesis presents different approaches to solve the multi-label classification problem using Hidden Markov Models (HMMs). First, different features that can be directly obtained from the raw data are introduced. Next, additional features are derived to improve classification performance. These features are then used to perform the multi-label classification using two different approaches. The first approach simply transforms the multi-label problem into a multi-class problem. The second, novel approach solves the same problem without the need to construct a transformation by predicting the labels directly from the likelihood scores. The second approach scales linearly with the number of labels whereas the first approach is subject to combinatorial explosion. All aspects of the classification process are evaluated on a data set that consists of 454 motions. System 1 achieves an accuracy of 98.02% and system 2 an accuracy of 93.39% on the test set.