Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Max Jaderberg

Convolution by Evolution: Differentiable Pattern Producing Networks

Jun 08, 2016

Chrisantha Fernando, Dylan Banarse, Malcolm Reynolds, Frederic Besse, David Pfau, Max Jaderberg, Marc Lanctot, Daan Wierstra

Figure 1 for Convolution by Evolution: Differentiable Pattern Producing Networks

Figure 2 for Convolution by Evolution: Differentiable Pattern Producing Networks

Figure 3 for Convolution by Evolution: Differentiable Pattern Producing Networks

Figure 4 for Convolution by Evolution: Differentiable Pattern Producing Networks

Abstract:In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN. Unlike a standard CPPN, the topology of a DPPN is evolved but the weights are learned. A Lamarckian algorithm, that combines evolution and learning, produces DPPNs to reconstruct an image. Our main result is that DPPNs can be evolved/trained to compress the weights of a denoising autoencoder from 157684 to roughly 200 parameters, while achieving a reconstruction accuracy comparable to a fully connected network with more than two orders of magnitude more parameters. The regularization ability of the DPPN allows it to rediscover (approximate) convolutional network architectures embedded within a fully connected architecture. Such convolutional architectures are the current state of the art for many computer vision applications, so it is satisfying that DPPNs are capable of discovering this structure rather than having to build it in by design. DPPNs exhibit better generalization when tested on the Omniglot dataset after being trained on MNIST, than directly encoded fully connected autoencoders. DPPNs are therefore a new framework for integrating learning and evolution.

Via

Access Paper or Ask Questions

Spatial Transformer Networks

Feb 04, 2016

Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu

Figure 1 for Spatial Transformer Networks

Figure 2 for Spatial Transformer Networks

Figure 3 for Spatial Transformer Networks

Figure 4 for Spatial Transformer Networks

Abstract:Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.

Via

Access Paper or Ask Questions

Deep Structured Output Learning for Unconstrained Text Recognition

Apr 10, 2015

Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Figure 1 for Deep Structured Output Learning for Unconstrained Text Recognition

Figure 2 for Deep Structured Output Learning for Unconstrained Text Recognition

Figure 3 for Deep Structured Output Learning for Unconstrained Text Recognition

Figure 4 for Deep Structured Output Learning for Unconstrained Text Recognition

Abstract:We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN that predicts characters at each position of the output, while higher order terms are provided by another CNN that detects the presence of N-grams. We show that this entire model (CRF, character predictor, N-gram predictor) can be jointly optimised by back-propagating the structured output loss, essentially requiring the system to perform multi-task learning, and training uses purely synthetically generated data. The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable.

* arXiv admin note: text overlap with arXiv:1406.2227

Via

Access Paper or Ask Questions

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Dec 09, 2014

Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Figure 1 for Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Figure 2 for Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Figure 3 for Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Figure 4 for Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Abstract:In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we consider three models, each one "reading" words in a different way: via 90k-way dictionary encoding, character sequence encoding, and bag-of-N-grams encoding. In the scenarios of language based and completely unconstrained text recognition we greatly improve upon state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs.

Via

Access Paper or Ask Questions

Reading Text in the Wild with Convolutional Neural Networks

Dec 04, 2014

Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Figure 1 for Reading Text in the Wild with Convolutional Neural Networks

Figure 2 for Reading Text in the Wild with Convolutional Neural Networks

Figure 3 for Reading Text in the Wild with Convolutional Neural Networks

Figure 4 for Reading Text in the Wild with Convolutional Neural Networks

Abstract:In this work we present an end-to-end system for text spotting -- localising and recognising text in natural scene images -- and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

Via

Access Paper or Ask Questions

Speeding up Convolutional Neural Networks with Low Rank Expansions

May 15, 2014

Max Jaderberg, Andrea Vedaldi, Andrew Zisserman

Figure 1 for Speeding up Convolutional Neural Networks with Low Rank Expansions

Figure 2 for Speeding up Convolutional Neural Networks with Low Rank Expansions

Figure 3 for Speeding up Convolutional Neural Networks with Low Rank Expansions

Abstract:The focus of this paper is speeding up the evaluation of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastically speeding up these layers. This is achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Our methods are architecture agnostic, and can be easily applied to existing CPU and GPU convolutional frameworks for tuneable speedup performance. We demonstrate this with a real world network designed for scene text character recognition, showing a possible 2.5x speedup with no loss in accuracy, and 4.5x speedup with less than 1% drop in accuracy, still achieving state-of-the-art on standard benchmarks.

Via

Access Paper or Ask Questions