Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Simon Lacoste-Julien

DIRO, MILA

Data-Efficient Structured Pruning via Submodular Optimization

Mar 09, 2022

Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien

Figure 1 for Data-Efficient Structured Pruning via Submodular Optimization

Figure 2 for Data-Efficient Structured Pruning via Submodular Optimization

Figure 3 for Data-Efficient Structured Pruning via Submodular Optimization

Figure 4 for Data-Efficient Structured Pruning via Submodular Optimization

Abstract:Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance, which involves removing redundant regular regions of weights. However, current structured pruning methods are highly empirical in nature, do not provide any theoretical guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is one of the few in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms popular baseline methods in various one-shot pruning settings.

Via

Access Paper or Ask Questions

Bayesian Structure Learning with Generative Flow Networks

Feb 28, 2022

Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

Figure 1 for Bayesian Structure Learning with Generative Flow Networks

Figure 2 for Bayesian Structure Learning with Generative Flow Networks

Figure 3 for Bayesian Structure Learning with Generative Flow Networks

Figure 4 for Bayesian Structure Learning with Generative Flow Networks

Abstract:In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling of discrete and composite objects, such as graphs. In this work, we propose to use a GFlowNet as an alternative to MCMC for approximating the posterior distribution over the structure of Bayesian networks, given a dataset of observations. Generating a sample DAG from this approximate distribution is viewed as a sequential decision problem, where the graph is constructed one edge at a time, based on learned transition probabilities. Through evaluation on both simulated and real data, we show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs, and it compares favorably against other methods based on MCMC or variational inference.

Via

Access Paper or Ask Questions

Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Nov 23, 2021

Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

Figure 1 for Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Figure 2 for Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Figure 3 for Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Figure 4 for Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Abstract:Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equivariance and improve it with approximate implicit differentiation, allowing for better optimization while being faster and saving memory. In a range of toy experiments, we show that the perspective of multiset-equivariance is beneficial and that our changes to DSPN achieve better results in most cases. On CLEVR object property prediction, we substantially improve over the state-of-the-art Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the benefits made possible by implicit differentiation.

Via

Access Paper or Ask Questions

Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Nov 12, 2021

Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

Figure 1 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Figure 2 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Figure 3 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Figure 4 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Abstract:We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime. After exhibiting various facets of the problem, we show we can interpret the MAP as running stochastic mirror descent (SMD) on the log-likelihood. However, modern convergence results do not apply for standard examples of the exponential family, highlighting holes in the convergence literature. We believe solving this very fundamental problem may bring progress to both the statistics and optimization communities.

* 9 pages and 3 figures + Appendix

Via

Access Paper or Ask Questions

A Survey of Self-Supervised and Few-Shot Object Detection

Nov 08, 2021

Gabriel Huang, Issam Laradji, David Vazquez, Simon Lacoste-Julien, Pau Rodriguez

Figure 1 for A Survey of Self-Supervised and Few-Shot Object Detection

Figure 2 for A Survey of Self-Supervised and Few-Shot Object Detection

Figure 3 for A Survey of Self-Supervised and Few-Shot Object Detection

Figure 4 for A Survey of Self-Supervised and Few-Shot Object Detection

Abstract:Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions. Project page at https://gabrielhuang.github.io/fsod-survey/

* Awesome Few-Shot Object Detection (Leaderboard) at https://github.com/gabrielhuang/awesome-few-shot-object-detection

Via

Access Paper or Ask Questions

Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA

Jul 21, 2021

Sébastien Lachapelle, Pau Rodríguez López, Rémi Le Priol, Alexandre Lacoste, Simon Lacoste-Julien

Figure 1 for Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA

Figure 2 for Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA

Figure 3 for Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA

Figure 4 for Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA

Abstract:It can be argued that finding an interpretable low-dimensional representation of a potentially high-dimensional phenomenon is central to the scientific enterprise. Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. This work proposes mechanism sparsity regularization as a new principle to achieve nonlinear ICA when latent factors depend sparsely on observed auxiliary variables and/or past latent factors. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse and if some graphical criterion is satisfied by the data generating process. As a special case, our framework shows how one can leverage unknown-target interventions on the latent factors to disentangle them, thus drawing further connections between ICA and causality. We validate our theoretical results with toy experiments.

* Appears in: Workshop on the Neglected Assumptions in Causal Inference (NACI) at the 38 th International Conference on Machine Learning, 2021. 19 pages

Via

Access Paper or Ask Questions

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Jun 30, 2021

Nicolas Loizou, Hugo Berard, Gauthier Gidel, Ioannis Mitliagkas, Simon Lacoste-Julien

Figure 1 for Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Figure 2 for Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Figure 3 for Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Figure 4 for Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Abstract:Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) (Mescheder et al., 2017). SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.

* 35 pages, 3 figures, 1 table

Via

Access Paper or Ask Questions

Structured Convolutional Kernel Networks for Airline Crew Scheduling

May 25, 2021

Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

Figure 1 for Structured Convolutional Kernel Networks for Airline Crew Scheduling

Figure 2 for Structured Convolutional Kernel Networks for Airline Crew Scheduling

Figure 3 for Structured Convolutional Kernel Networks for Airline Crew Scheduling

Figure 4 for Structured Convolutional Kernel Networks for Airline Crew Scheduling

Abstract:Motivated by the needs from an airline crew scheduling application, we introduce structured convolutional kernel networks (Struct-CKN), which combine CKNs from Mairal et al. (2014) in a structured prediction framework that supports constraints on the outputs. CKNs are a particular kind of convolutional neural networks that approximate a kernel feature map on training data, thus combining properties of deep learning with the non-parametric flexibility of kernel methods. Extending CKNs to structured outputs allows us to obtain useful initial solutions on a flight-connection dataset that can be further refined by an airline crew scheduling solver. More specifically, we use a flight-based network modeled as a general conditional random field capable of incorporating local constraints in the learning process. Our experiments demonstrate that this approach yields significant improvements for the large-scale crew pairing problem (50,000 flights per month) over standard approaches, reducing the solution cost by 17% (a gain of millions of dollars) and the cost of global constraints by 97%.

* ICML 2021

Via

Access Paper or Ask Questions

Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

Mar 16, 2021

Namyeong Kwon, Hwidong Na, Gabriel Huang, Simon Lacoste-Julien

Figure 1 for Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

Figure 2 for Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

Figure 3 for Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

Figure 4 for Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

Abstract:Model-agnostic meta-learning (MAML) is a popular method for few-shot learning but assumes that we have access to the meta-training set. In practice, training on the meta-training set may not always be an option due to data privacy concerns, intellectual property issues, or merely lack of computing resources. In this paper, we consider the novel problem of repurposing pretrained MAML checkpoints to solve new few-shot classification tasks. Because of the potential distribution mismatch, the original MAML steps may no longer be optimal. Therefore we propose an alternative meta-testing procedure and combine MAML gradient steps with adversarial training and uncertainty-based stepsize adaptation. Our method outperforms "vanilla" MAML on same-domain and cross-domains benchmarks using both SGD and Adam optimizers and shows improved robustness to the choice of base stepsize.

* Appears in: Proceedings of the Ninth International Conference on Learning Representations (ICLR 2021). 20 pages

Via

Access Paper or Ask Questions

Online Adversarial Attacks

Mar 02, 2021

Andjela Mladenovic, Avishek Joey Bose, Hugo Berard, William L. Hamilton, Simon Lacoste-Julien, Pascal Vincent, Gauthier Gidel

Abstract:Adversarial attacks expose important vulnerabilities of deep learning models, yet little attention has been paid to settings where data arrives as a stream. In this paper, we formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases: attackers must operate under partial knowledge of the target model, and the decisions made by the attacker are irrevocable since they operate on a transient data stream. We first rigorously analyze a deterministic variant of the online threat model by drawing parallels to the well-studied $k$-\textit{secretary problem} and propose \algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm. We also introduce the \textit{stochastic $k$-secretary} -- effectively reducing online blackbox attacks to a $k$-secretary problem under noise -- and prove theoretical bounds on the competitive ratios of \textit{any} online algorithms adapted to this setting. Finally, we complement our theoretical results by conducting a systematic suite of experiments on MNIST and CIFAR-10 with both vanilla and robust classifiers, revealing that, by leveraging online secretary algorithms, like \algoname, we can get an online attack success rate close to the one achieved by the optimal offline solution.

* Preprint

Via

Access Paper or Ask Questions