Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anna Choromanska

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Oct 24, 2018
Anna Choromanska, Sadhana Kumaravel, Ronny Luss, Irina Rish, Brian Kingsbury, Mattia Rigotti, Paolo DiAchille, Viatcheslav Gurev, Ravi Tejwani, Djallel Bouneffouf

Figure 1 for Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Figure 2 for Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Figure 3 for Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Figure 4 for Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

We propose a novel online alternating minimization (AltMin) algorithm for training deep neural networks, provide theoretical convergence guarantees and demonstrate its advantages on several classification tasks as compared both to standard backpropagation with stochastic gradient descent (backprop-SGD) and to offline alternating minimization. The key difference from backpropagation is an explicit optimization over hidden activations, which eliminates gradient chain computation in backprop, and breaks the weight training problem into independent, local optimization subproblems; this allows to avoid vanishing gradient issues, simplify handling non-differentiable nonlinearities, and perform parallel weight updates across the layers. Moreover, parallel local synaptic weight optimization with explicit activation propagation is a step closer to a more biologically plausible learning model than backpropagation, whose biological implausibility has been frequently criticized. Finally, the online nature of our approach allows to handle very large datasets, as well as continual, lifelong learning, which is our key contribution on top of recently proposed offline alternating minimization schemes (e.g., (Carreira-Perpinan andWang 2014), (Taylor et al. 2016)).

* First four authors contributed equally to this work: A.C. - theory, manuscript, S.K. - code, experiments, R.L. - algorithm, experiments, I.R. - algorithm, manuscript

Via

Access Paper or Ask Questions

VisualBackProp for learning using privileged information with CNNs

May 24, 2018
Devansh Bisla, Anna Choromanska

Figure 1 for VisualBackProp for learning using privileged information with CNNs

Figure 2 for VisualBackProp for learning using privileged information with CNNs

Figure 3 for VisualBackProp for learning using privileged information with CNNs

Figure 4 for VisualBackProp for learning using privileged information with CNNs

In many machine learning applications, from medical diagnostics to autonomous driving, the availability of prior knowledge can be used to improve the predictive performance of learning algorithms and incorporate `physical,' `domain knowledge,' or `common sense' concepts into training of machine learning systems as well as verify constraints/properties of the systems. We explore the learning using privileged information paradigm and show how to incorporate the privileged information, such as segmentation mask available along with the classification label of each example, into the training stage of convolutional neural networks. This is done by augmenting the CNN model with an architectural component that effectively focuses model's attention on the desired region of the input image during the training process and that is transparent to the network's label prediction mechanism at testing. This component effectively corresponds to the visualization strategy for identifying the parts of the input, often referred to as visualization mask, that most contribute to the prediction, yet uses this strategy in reverse to the classical setting in order to enforce the desired visualization mask instead. We verify our proposed algorithms through exhaustive experiments on benchmark ImageNet and PASCAL VOC data sets and achieve improvements in the performance of $2.4\%$ and $2.7\%$ over standard single-supervision model training. Finally, we confirm the effectiveness of our approach on skin lesion classification problem.

Via

Access Paper or Ask Questions

A Deep Unsupervised Learning Approach Toward MTBI Identification Using Diffusion MRI

Apr 11, 2018
Shervin Minaee, Yao Wang, Anna Choromanska, Sohae Chung, Xiuyuan Wang, Els Fieremans, Steven Flanagan, Joseph Rath, Yvonne W Lui

Figure 1 for A Deep Unsupervised Learning Approach Toward MTBI Identification Using Diffusion MRI

Figure 2 for A Deep Unsupervised Learning Approach Toward MTBI Identification Using Diffusion MRI

Figure 3 for A Deep Unsupervised Learning Approach Toward MTBI Identification Using Diffusion MRI

Figure 4 for A Deep Unsupervised Learning Approach Toward MTBI Identification Using Diffusion MRI

Mild traumatic brain injury is a growing public health problem with an estimated incidence of over 1.7 million people annually in US. Diagnosis is based on clinical history and symptoms, and accurate, concrete measures of injury are lacking. This work aims to directly use diffusion MR images obtained within one month of trauma to detect injury, by incorporating deep learning techniques. To overcome the challenge due to limited training data, we describe each brain region using the bag of word representation, which specifies the distribution of representative patch patterns. We apply a convolutional auto-encoder to learn the patch-level features, from overlapping image patches extracted from the MR images, to learn features from diffusion MR images of brain using an unsupervised approach. Our experimental results show that the bag of word representation using patch level features learnt by the auto encoder provides similar performance as that using the raw patch patterns, both significantly outperform earlier work relying on the mean values of MR metrics in selected brain regions.

* arXiv admin note: text overlap with arXiv:1710.06824

Via

Access Paper or Ask Questions

LSALSA: efficient sparse coding in single and multiple dictionary settings

Feb 13, 2018
Benjamin Cowen, Apoorva Nandini Saridena, Anna Choromanska

Figure 1 for LSALSA: efficient sparse coding in single and multiple dictionary settings

Figure 2 for LSALSA: efficient sparse coding in single and multiple dictionary settings

Figure 3 for LSALSA: efficient sparse coding in single and multiple dictionary settings

Figure 4 for LSALSA: efficient sparse coding in single and multiple dictionary settings

We propose an efficient sparse coding (SC) framework for obtaining sparse representation of data. The proposed framework is very general and applies to both the single dictionary setting, where each data point is represented as a sparse combination of the columns of one dictionary matrix, as well as the multiple dictionary setting as given in morphological component analysis (MCA), where the goal is to separate the data into additive parts such that each part has distinct sparse representation within an appropriately chosen corresponding dictionary. Both tasks have been cast as $\ell_1$-regularized optimization problems of minimizing quadratic reconstruction error. In an effort to accelerate traditional acquisition of sparse codes, we propose a deep learning architecture that constitutes a trainable time-unfolded version of the Split Augmented Lagrangian Shrinkage Algorithm (SALSA), a special case of the alternating direction method of multipliers (ADMM). We empirically validate both variants of the algorithm on image vision tasks and demonstrate that at inference our networks achieve improvements in terms of the running time and the quality of estimated sparse codes on both classic SC and MCA problems over more common baselines. We finally demonstrate the visual advantage of our technique on the task of source separation.

Via

Access Paper or Ask Questions

Invertible Autoencoder for domain adaptation

Feb 10, 2018
Yunfei Teng, Anna Choromanska, Mariusz Bojarski

Figure 1 for Invertible Autoencoder for domain adaptation

Figure 2 for Invertible Autoencoder for domain adaptation

Figure 3 for Invertible Autoencoder for domain adaptation

Figure 4 for Invertible Autoencoder for domain adaptation

The unsupervised image-to-image translation aims at finding a mapping between the source ($A$) and target ($B$) image domains, where in many applications aligned image pairs are not available at training. This is an ill-posed learning problem since it requires inferring the joint probability distribution from marginals. Joint learning of coupled mappings $F_{AB}: A \rightarrow B$ and $F_{BA}: B \rightarrow A$ is commonly used by the state-of-the-art methods, like CycleGAN [Zhu et al., 2017], to learn this translation by introducing cycle consistency requirement to the learning problem, i.e. $F_{AB}(F_{BA}(B)) \approx B$ and $F_{BA}(F_{AB}(A)) \approx A$. Cycle consistency enforces the preservation of the mutual information between input and translated images. However, it does not explicitly enforce $F_{BA}$ to be an inverse operation to $F_{AB}$. We propose a new deep architecture that we call invertible autoencoder (InvAuto) to explicitly enforce this relation. This is done by forcing an encoder to be an inverted version of the decoder, where corresponding layers perform opposite mappings and share parameters. The mappings are constrained to be orthonormal. The resulting architecture leads to the reduction of the number of trainable parameters (up to $2$ times). We present image translation results on benchmark data sets and demonstrate state-of-the art performance of our approach. Finally, we test the proposed domain adaptation method on the task of road video conversion. We demonstrate that the videos converted with InvAuto have high quality and show that the NVIDIA neural-network-based end-to-end learning system for autonomous driving, known as PilotNet, trained on real road videos performs well when tested on the converted ones.

Via

Access Paper or Ask Questions

VisualBackProp: efficient visualization of CNNs

May 19, 2017
Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Larry Jackel, Urs Muller, Karol Zieba

Figure 1 for VisualBackProp: efficient visualization of CNNs

Figure 2 for VisualBackProp: efficient visualization of CNNs

Figure 3 for VisualBackProp: efficient visualization of CNNs

Figure 4 for VisualBackProp: efficient visualization of CNNs

This paper proposes a new method, that we call VisualBackProp, for visualizing which sets of pixels of the input image contribute most to the predictions made by the convolutional neural network (CNN). The method heavily hinges on exploring the intuition that the feature maps contain less and less irrelevant information to the prediction decision when moving deeper into the network. The technique we propose was developed as a debugging tool for CNN-based systems for steering self-driving cars and is therefore required to run in real-time, i.e. it was designed to require less computations than a forward propagation. This makes the presented visualization method a valuable debugging tool which can be easily used during both training and inference. We furthermore justify our approach with theoretical arguments and theoretically confirm that the proposed method identifies sets of input pixels, rather than individual pixels, that collaboratively contribute to the prediction. Our theoretical findings stand in agreement with the experimental results. The empirical evaluation shows the plausibility of the proposed approach on the road video data as well as in other applications and reveals that it compares favorably to the layer-wise relevance propagation approach, i.e. it obtains similar visualization results and simultaneously achieves order of magnitude speed-ups.

Via

Access Paper or Ask Questions

Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

Apr 25, 2017
Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller

Figure 1 for Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

Figure 2 for Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

Figure 3 for Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

Figure 4 for Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

As part of a complete software stack for autonomous driving, NVIDIA has created a neural-network-based system, known as PilotNet, which outputs steering angles given images of the road ahead. PilotNet is trained using road images paired with the steering angles generated by a human driving a data-collection car. It derives the necessary domain knowledge by observing human drivers. This eliminates the need for human engineers to anticipate what is important in an image and foresee all the necessary rules for safe driving. Road tests demonstrated that PilotNet can successfully perform lane keeping in a wide variety of driving conditions, regardless of whether lane markings are present or not. The goal of the work described here is to explain what PilotNet learns and how it makes its decisions. To this end we developed a method for determining which elements in the road image most influence PilotNet's steering decision. Results show that PilotNet indeed learns to recognize relevant objects on the road. In addition to learning the obvious features such as lane markings, edges of roads, and other cars, PilotNet learns more subtle features that would be hard to anticipate and program by engineers, for example, bushes lining the edge of the road and atypical vehicle classes.

Via

Access Paper or Ask Questions

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Apr 21, 2017
Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

Figure 1 for Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Figure 2 for Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Figure 3 for Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Figure 4 for Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.

* ICLR '17

Via

Access Paper or Ask Questions

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Mar 02, 2017
Yacine Jernite, Anna Choromanska, David Sontag

Figure 1 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Figure 2 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Figure 3 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Figure 4 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchi- cal predictor. Our approach optimizes an objec- tive function which favors balanced and easily- separable multi-way node partitions. We theoret- ically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the al- gorithm on text classification and language mod- eling, respectively, and show that they compare favorably to common baselines in terms of accu- racy and running time.

Via

Access Paper or Ask Questions

Structured adaptive and random spinners for fast machine learning computations

Nov 26, 2016
Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Nourhan Sakr, Tamas Sarlos, Jamal Atif

Figure 1 for Structured adaptive and random spinners for fast machine learning computations

Figure 2 for Structured adaptive and random spinners for fast machine learning computations

Figure 3 for Structured adaptive and random spinners for fast machine learning computations

Figure 4 for Structured adaptive and random spinners for fast machine learning computations

We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms, new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD3HD2HD1 structured matrix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications.

* arXiv admin note: substantial text overlap with arXiv:1605.09046

Via

Access Paper or Ask Questions