Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alec Radford

Tony

Proximal Policy Optimization Algorithms

Aug 28, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

Figure 1 for Proximal Policy Optimization Algorithms

Figure 2 for Proximal Policy Optimization Algorithms

Figure 3 for Proximal Policy Optimization Algorithms

Figure 4 for Proximal Policy Optimization Algorithms

Abstract:We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

Via

Access Paper or Ask Questions

Learning to Generate Reviews and Discovering Sentiment

Apr 06, 2017

Alec Radford, Rafal Jozefowicz, Ilya Sutskever

Figure 1 for Learning to Generate Reviews and Discovering Sentiment

Figure 2 for Learning to Generate Reviews and Discovering Sentiment

Figure 3 for Learning to Generate Reviews and Discovering Sentiment

Figure 4 for Learning to Generate Reviews and Discovering Sentiment

Abstract:We explore the properties of byte-level recurrent language models. When given sufficient amounts of capacity, training data, and compute time, the representations learned by these models include disentangled features corresponding to high-level concepts. Specifically, we find a single unit which performs sentiment analysis. These representations, learned in an unsupervised manner, achieve state of the art on the binary subset of the Stanford Sentiment Treebank. They are also very data efficient. When using only a handful of labeled examples, our approach matches the performance of strong baselines trained on full datasets. We also demonstrate the sentiment unit has a direct influence on the generative process of the model. Simply fixing its value to be positive or negative generates samples with the corresponding positive or negative sentiment.

Via

Access Paper or Ask Questions

Improved Techniques for Training GANs

Jun 10, 2016

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen

Figure 1 for Improved Techniques for Training GANs

Figure 2 for Improved Techniques for Training GANs

Figure 3 for Improved Techniques for Training GANs

Figure 4 for Improved Techniques for Training GANs

Abstract:We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.

Via

Access Paper or Ask Questions

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Jan 07, 2016

Alec Radford, Luke Metz, Soumith Chintala

Figure 1 for Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Figure 2 for Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Figure 3 for Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Figure 4 for Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Abstract:In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

* Under review as a conference paper at ICLR 2016

Via

Access Paper or Ask Questions