Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Edouard Oyallon

MLIA

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

Jun 11, 2021
Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon

Figure 1 for Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

Figure 2 for Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

Figure 3 for Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

Figure 4 for Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a classic greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization. With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays. To address bandwidth and memory issues we propose an approach based on online vector quantization. This allows to drastically reduce the communication bandwidth between modules and required memory for replay buffers. We show theoretically and empirically that this approach converges and compare it to the sequential solvers. We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale ImageNet dataset.

* arXiv admin note: substantial text overlap with arXiv:1901.08164

Via

Access Paper or Ask Questions

Interferometric Graph Transform for Community Labeling

Jun 04, 2021
Nathan Grinsztajn, Louis Leconte, Philippe Preux, Edouard Oyallon

Figure 1 for Interferometric Graph Transform for Community Labeling

Figure 2 for Interferometric Graph Transform for Community Labeling

Figure 3 for Interferometric Graph Transform for Community Labeling

We present a new approach for learning unsupervised node representations in community graphs. We significantly extend the Interferometric Graph Transform (IGT) to community labeling: this non-linear operator iteratively extracts features that take advantage of the graph topology through demodulation operations. An unsupervised feature extraction step cascades modulus non-linearity with linear operators that aim at building relevant invariants for community labeling. Via a simplified model, we show that the IGT concentrates around the E-IGT: those two representations are related through some ergodicity properties. Experiments on community labeling tasks show that this unsupervised representation achieves performances at the level of the state of the art on the standard and challenging datasets Cora, Citeseer, Pubmed and WikiCS.

Via

Access Paper or Ask Questions

Low-Rank Projections of GCNs Laplacian

Jun 04, 2021
Nathan Grinsztajn, Philippe Preux, Edouard Oyallon

Figure 1 for Low-Rank Projections of GCNs Laplacian

Figure 2 for Low-Rank Projections of GCNs Laplacian

Figure 3 for Low-Rank Projections of GCNs Laplacian

Figure 4 for Low-Rank Projections of GCNs Laplacian

In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the performance of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to images, high frequencies are less crucial to community detection. In particular, it is sometimes possible to obtain accuracies at a state-of-the-art level with simple classifiers that rely only on a few low frequencies.

* ICLR 2021 Workshop GTRL, 2021, Online, France

Via

Access Paper or Ask Questions

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

Jan 19, 2021
Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

Figure 1 for The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

Figure 2 for The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

Figure 3 for The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

Figure 4 for The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis. In this work, we highlight the importance of a data-dependent feature extraction step that is key to the obtain good performance in convolutional kernel methods. This step typically corresponds to a whitened dictionary of patches, and gives rise to a data-driven convolutional kernel methods. We extensively study its effect, demonstrating it is the key ingredient for high performance of these methods. Specifically, we show that one of the simplest instances of such kernel methods, based on a single layer of image patches followed by a linear classifier is already obtaining classification accuracies on CIFAR-10 in the same range as previous more sophisticated convolutional kernel methods. We scale this method to the challenging ImageNet dataset, showing such a simple approach can exceed all existing non-learned representation methods. This is a new baseline for object recognition without representation learning methods, that initiates the investigation of convolutional kernel models on ImageNet. We conduct experiments to analyze the dictionary that we used, our ablations showing they exhibit low-dimensional properties.

* International Conference on Learning Representation (ICLR 2021), 2021, Vienna (online), Austria

Via

Access Paper or Ask Questions

Interferometric Graph Transform: a Deep Unsupervised Graph Representation

Jun 10, 2020
Edouard Oyallon

Figure 1 for Interferometric Graph Transform: a Deep Unsupervised Graph Representation

Figure 2 for Interferometric Graph Transform: a Deep Unsupervised Graph Representation

Figure 3 for Interferometric Graph Transform: a Deep Unsupervised Graph Representation

Figure 4 for Interferometric Graph Transform: a Deep Unsupervised Graph Representation

We propose the Interferometric Graph Transform (IGT), which is a new class of deep unsupervised graph convolutional neural network for building graph representations. Our first contribution is to propose a generic, complex-valued spectral graph architecture obtained from a generalization of the Euclidean Fourier transform. We show that our learned representation consists of both discriminative and invariant features, thanks to a novel greedy concave objective. From our experiments, we conclude that our learning procedure exploits the topology of the spectral domain, which is normally a flaw of spectral methods, and in particular our method can recover an analytic operator for vision tasks. We test our algorithm on various and challenging tasks such as image classification (MNIST, CIFAR-10), community detection (Authorship, Facebook graph) and action recognition from 3D skeletons videos (SBU, NTU), exhibiting a new state-of-the-art in spectral graph unsupervised settings.

* International Conference on Machine Learning (ICML), 2020, Online, Austria

Via

Access Paper or Ask Questions

Decoupled Greedy Learning of CNNs

Jan 23, 2019
Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Figure 1 for Decoupled Greedy Learning of CNNs

Figure 2 for Decoupled Greedy Learning of CNNs

Figure 3 for Decoupled Greedy Learning of CNNs

Figure 4 for Decoupled Greedy Learning of CNNs

A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the network before updating. We consider and analyze a training procedure, Decoupled Greedy Learning (DGL), that addresses this problem more effectively and at scales beyond those of previous solutions. It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization in layers. We show theoretically and empirically that this approach converges. In addition, we empirically find that it can lead to better generalization than sequential greedy optimization and even standard end-to-end back-propagation. We show that an extension of this approach to asynchronous settings, where modules can operate with large communication delays, is possible with the use of a replay buffer. We demonstrate the effectiveness of DGL on the CIFAR-10 datasets against alternatives and on the large-scale ImageNet dataset, where we are able to effectively train VGG and ResNet-152 models.

* 14 pages

Via

Access Paper or Ask Questions

Greedy Layerwise Learning Can Scale to ImageNet

Dec 29, 2018
Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Figure 1 for Greedy Layerwise Learning Can Scale to ImageNet

Figure 2 for Greedy Layerwise Learning Can Scale to ImageNet

Figure 3 for Greedy Layerwise Learning Can Scale to ImageNet

Figure 4 for Greedy Layerwise Learning Can Scale to ImageNet

Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize than their deep counterparts, but lack their representational power. Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks. Contrary to previous approaches using shallow networks, we focus on problems where deep learning is reported as critical for success. We thus study CNNs on image recognition tasks using the large-scale ImageNet dataset and the CIFAR-10 dataset. Using a simple set of ideas for architecture and training we find that solving sequential 1-hidden-layer auxiliary problems leads to a CNN that exceeds AlexNet performance on ImageNet. Extending our training methodology to construct individual layers by solving 2-and-3-hidden layer auxiliary problems, we obtain an 11-layer network that exceeds VGG-11 on ImageNet obtaining 89.8% top-5 single crop. To our knowledge, this is the first competitive alternative to end-to-end training of CNNs that can scale to ImageNet. We conduct a wide range of experiments to study the properties this induces on the intermediate layers.

Via

Access Paper or Ask Questions

Kymatio: Scattering Transforms in Python

Dec 28, 2018
Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim Andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Matthew J. Hirn, Edouard Oyallon, Sixhin Zhang, Carmine Cella, Michael Eickenberg

Figure 1 for Kymatio: Scattering Transforms in Python

The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU), offering a considerable speed up over CPU implementations. The package also has a small memory footprint, resulting inefficient memory usage. The source code, documentation, and examples are available undera BSD license at https://www.kymat.io/

Via

Access Paper or Ask Questions

Compressing the Input for CNNs with the First-Order Scattering Transform

Sep 27, 2018
Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

Figure 1 for Compressing the Input for CNNs with the First-Order Scattering Transform

Figure 2 for Compressing the Input for CNNs with the First-Order Scattering Transform

Figure 3 for Compressing the Input for CNNs with the First-Order Scattering Transform

Figure 4 for Compressing the Input for CNNs with the First-Order Scattering Transform

We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and total signal size. We demonstrate that cascading a CNN with this representation performs on par with ImageNet classification models, commonly used in downstream tasks, such as the ResNet-50. We subsequently apply our trained hybrid ImageNet model as a base model on a detection system, which has typically larger image inputs. On Pascal VOC and COCO detection tasks we demonstrate improvements in the inference speed and training memory consumption compared to models trained directly on the input image.

* ECCV 2018

Via

Access Paper or Ask Questions

Scattering Networks for Hybrid Representation Learning

Sep 17, 2018
Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

Figure 1 for Scattering Networks for Hybrid Representation Learning

Figure 2 for Scattering Networks for Hybrid Representation Learning

Figure 3 for Scattering Networks for Hybrid Representation Learning

Figure 4 for Scattering Networks for Hybrid Representation Learning

Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we demonstrate that the early layers of CNNs do not necessarily need to be learned, and can be replaced with a scattering network instead. Indeed, using hybrid architectures, we achieve the best results with predefined representations to-date, while being competitive with end-to-end learned CNNs. Specifically, even applying a shallow cascade of small-windowed scattering coefficients followed by 1$\times$1-convolutions results in AlexNet accuracy on the ILSVRC2012 classification task. Moreover, by combining scattering networks with deep residual networks, we achieve a single-crop top-5 error of 11.4% on ILSVRC2012. Also, we show they can yield excellent performance in the small sample regime on CIFAR-10 and STL-10 datasets, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. For unsupervised learning, scattering coefficients can be a competitive representation that permits image recovery. We use this fact to train hybrid GANs to generate images. Finally, we empirically analyze several properties related to stability and reconstruction of images from scattering coefficients.

* IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2018, pp.11
* arXiv admin note: substantial text overlap with arXiv:1703.08961

Via

Access Paper or Ask Questions