Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alex Graves

DRAW: A Recurrent Neural Network For Image Generation

May 20, 2015

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra

Figure 1 for DRAW: A Recurrent Neural Network For Image Generation

Figure 2 for DRAW: A Recurrent Neural Network For Image Generation

Figure 3 for DRAW: A Recurrent Neural Network For Image Generation

Figure 4 for DRAW: A Recurrent Neural Network For Image Generation

Abstract:This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto-encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.

Via

Access Paper or Ask Questions

Neural Turing Machines

Dec 10, 2014

Alex Graves, Greg Wayne, Ivo Danihelka

Abstract:We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

Via

Access Paper or Ask Questions

Recurrent Models of Visual Attention

Jun 24, 2014

Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

Figure 1 for Recurrent Models of Visual Attention

Figure 2 for Recurrent Models of Visual Attention

Figure 3 for Recurrent Models of Visual Attention

Figure 4 for Recurrent Models of Visual Attention

Abstract:Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Via

Access Paper or Ask Questions

Generating Sequences With Recurrent Neural Networks

Jun 05, 2014

Alex Graves

Figure 1 for Generating Sequences With Recurrent Neural Networks

Figure 2 for Generating Sequences With Recurrent Neural Networks

Figure 3 for Generating Sequences With Recurrent Neural Networks

Abstract:This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.

* Thanks to Peng Liu and Sergey Zyrianov for various corrections

Via

Access Paper or Ask Questions

Playing Atari with Deep Reinforcement Learning

Dec 19, 2013

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

Figure 1 for Playing Atari with Deep Reinforcement Learning

Figure 2 for Playing Atari with Deep Reinforcement Learning

Figure 3 for Playing Atari with Deep Reinforcement Learning

Figure 4 for Playing Atari with Deep Reinforcement Learning

Abstract:We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

* NIPS Deep Learning Workshop 2013

Via

Access Paper or Ask Questions

Speech Recognition with Deep Recurrent Neural Networks

Mar 22, 2013

Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton

Figure 1 for Speech Recognition with Deep Recurrent Neural Networks

Figure 2 for Speech Recognition with Deep Recurrent Neural Networks

Abstract:Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

* To appear in ICASSP 2013

Via

Access Paper or Ask Questions

Sequence Transduction with Recurrent Neural Networks

Nov 14, 2012

Alex Graves

Figure 1 for Sequence Transduction with Recurrent Neural Networks

Figure 2 for Sequence Transduction with Recurrent Neural Networks

Figure 3 for Sequence Transduction with Recurrent Neural Networks

Figure 4 for Sequence Transduction with Recurrent Neural Networks

Abstract:Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

* First published in the International Conference of Machine Learning (ICML) 2012 Workshop on Representation Learning

Via

Access Paper or Ask Questions

Phoneme recognition in TIMIT with BLSTM-CTC

Apr 21, 2008

Santiago Fernández, Alex Graves, Juergen Schmidhuber

Figure 1 for Phoneme recognition in TIMIT with BLSTM-CTC

Abstract:We compare the performance of a recurrent neural network with the best results published so far on phoneme recognition in the TIMIT database. These published results have been obtained with a combination of classifiers. However, in this paper we apply a single recurrent neural network to the same task. Our recurrent neural network attains an error rate of 24.6%. This result is not significantly different from that obtained by the other best methods, but they rely on a combination of classifiers for achieving comparable performance.

* 8 pages

Via

Access Paper or Ask Questions

Multi-Dimensional Recurrent Neural Networks

May 14, 2007

Alex Graves, Santiago Fernandez, Juergen Schmidhuber

Figure 1 for Multi-Dimensional Recurrent Neural Networks

Figure 2 for Multi-Dimensional Recurrent Neural Networks

Figure 3 for Multi-Dimensional Recurrent Neural Networks

Figure 4 for Multi-Dimensional Recurrent Neural Networks

Abstract:Recurrent neural networks (RNNs) have proved effective at one dimensional sequence learning tasks, such as speech and online handwriting recognition. Some of the properties that make RNNs suitable for such tasks, for example robustness to input warping, and the ability to access contextual information, are also desirable in multidimensional domains. However, there has so far been no direct way of applying RNNs to data with more than one spatio-temporal dimension. This paper introduces multi-dimensional recurrent neural networks (MDRNNs), thereby extending the potential applicability of RNNs to vision, video processing, medical imaging and many other areas, while avoiding the scaling problems that have plagued other multi-dimensional models. Experimental results are provided for two image segmentation tasks.

* 10 pages, 10 figures

Via

Access Paper or Ask Questions