Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jennifer Dy

Learning to Prompt for Continual Learning

Dec 16, 2021

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister

Figure 1 for Learning to Prompt for Continual Learning

Figure 2 for Learning to Prompt for Continual Learning

Figure 3 for Learning to Prompt for Continual Learning

Figure 4 for Learning to Prompt for Continual Learning

Abstract:The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. Our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions. In our proposed framework, prompts are small learnable parameters, which are maintained in a memory space. The objective is to optimize prompts to instruct the model prediction and explicitly manage task-invariant and task-specific knowledge while maintaining model plasticity. We conduct comprehensive experiments under popular image classification benchmarks with different challenging continual learning settings, where L2P consistently outperforms prior state-of-the-art methods. Surprisingly, L2P achieves competitive results against rehearsal-based methods even without a rehearsal buffer and is directly applicable to challenging task-agnostic continual learning. Source code is available at https://github.com/google-research/l2p.

Via

Access Paper or Ask Questions

Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection

Nov 08, 2021

Max Torop, Sandesh Ghimire, Wenqian Liu, Dana H. Brooks, Octavia Camps, Milind Rajadhyaksha, Jennifer Dy, Kivanc Kose

Figure 1 for Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection

Figure 2 for Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection

Abstract:There are limited works showing the efficacy of unsupervised Out-of-Distribution (OOD) methods on complex medical data. Here, we present preliminary findings of our unsupervised OOD detection algorithm, SimCLR-LOF, as well as a recent state of the art approach (SSD), applied on medical images. SimCLR-LOF learns semantically meaningful features using SimCLR and uses LOF for scoring if a test sample is OOD. We evaluated on the multi-source International Skin Imaging Collaboration (ISIC) 2019 dataset, and show results that are competitive with SSD as well as with recent supervised approaches applied on the same data.

* NeurIPS: Medical Imaging Meets NeurIPS Workshop

Via

Access Paper or Ask Questions

Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Sep 29, 2021

Sandesh Ghimire, Aria Masoomi, Jennifer Dy

Figure 1 for Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Figure 2 for Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Figure 3 for Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Figure 4 for Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Abstract:Estimating Kullback Leibler (KL) divergence from samples of two distributions is essential in many machine learning problems. Variational methods using neural network discriminator have been proposed to achieve this task in a scalable manner. However, we noted that most of these methods using neural network discriminators suffer from high fluctuations (variance) in estimates and instability in training. In this paper, we look at this issue from statistical learning theory and function space complexity perspective to understand why this happens and how to solve it. We argue that the cause of these pathologies is lack of control over the complexity of the neural network discriminator function and could be mitigated by controlling it. To achieve this objective, we 1) present a novel construction of the discriminator in the Reproducing Kernel Hilbert Space (RKHS), 2) theoretically relate the error probability bound of the KL estimates to the complexity of the discriminator in the RKHS space, 3) present a scalable way to control the complexity (RKHS norm) of the discriminator for a reliable estimation of KL divergence, and 4) prove the consistency of the proposed estimator. In three different applications of KL divergence : estimation of KL, estimation of mutual information and Variational Bayes, we show that by controlling the complexity as developed in the theory, we are able to reduce the variance of KL estimates and stabilize the training

* Advances in Neural Information Processing Systems 2021
* 27 pages, 3 figures. arXiv admin note: text overlap with arXiv:2002.11187

Via

Access Paper or Ask Questions

Deep Bayesian Unsupervised Lifelong Learning

Jun 13, 2021

Tingting Zhao, Zifeng Wang, Aria Masoomi, Jennifer Dy

Figure 1 for Deep Bayesian Unsupervised Lifelong Learning

Figure 2 for Deep Bayesian Unsupervised Lifelong Learning

Figure 3 for Deep Bayesian Unsupervised Lifelong Learning

Figure 4 for Deep Bayesian Unsupervised Lifelong Learning

Abstract:Lifelong Learning (LL) refers to the ability to continually learn and solve new problems with incremental available information over time while retaining previous knowledge. Much attention has been given lately to Supervised Lifelong Learning (SLL) with a stream of labelled data. In contrast, we focus on resolving challenges in Unsupervised Lifelong Learning (ULL) with streaming unlabelled data when the data distribution and the unknown class labels evolve over time. Bayesian framework is natural to incorporate past knowledge and sequentially update the belief with new data. We develop a fully Bayesian inference framework for ULL with a novel end-to-end Deep Bayesian Unsupervised Lifelong Learning (DBULL) algorithm, which can progressively discover new clusters without forgetting the past with unlabelled data while learning latent representations. To efficiently maintain past knowledge, we develop a novel knowledge preservation mechanism via sufficient statistics of the latent representation for raw data. To detect the potential new clusters on the fly, we develop an automatic cluster discovery and redundancy removal strategy in our inference inspired by Nonparametric Bayesian statistics techniques. We demonstrate the effectiveness of our approach using image and text corpora benchmark datasets in both LL and batch settings.

Via

Access Paper or Ask Questions

Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness

Jun 04, 2021

Zifeng Wang, Tong Jian, Aria Masoomi, Stratis Ioannidis, Jennifer Dy

Figure 1 for Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness

Figure 2 for Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness

Figure 3 for Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness

Figure 4 for Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness

Abstract:We investigate the HSIC (Hilbert-Schmidt independence criterion) bottleneck as a regularizer for learning an adversarially robust deep neural network classifier. We show that the HSIC bottleneck enhances robustness to adversarial attacks both theoretically and experimentally. Our experiments on multiple benchmark datasets and architectures demonstrate that incorporating an HSIC bottleneck regularizer attains competitive natural accuracy and improves adversarial robustness, both with and without adversarial examples during training.

Via

Access Paper or Ask Questions

On the Sample Complexity of Rank Regression from Pairwise Comparisons

May 04, 2021

Berkan Kadioglu, Peng Tian, Jennifer Dy, Deniz Erdogmus, Stratis Ioannidis

Figure 1 for On the Sample Complexity of Rank Regression from Pairwise Comparisons

Figure 2 for On the Sample Complexity of Rank Regression from Pairwise Comparisons

Figure 3 for On the Sample Complexity of Rank Regression from Pairwise Comparisons

Figure 4 for On the Sample Complexity of Rank Regression from Pairwise Comparisons

Abstract:We consider a rank regression setting, in which a dataset of $N$ samples with features in $\mathbb{R}^d$ is ranked by an oracle via $M$ pairwise comparisons. Specifically, there exists a latent total ordering of the samples; when presented with a pair of samples, a noisy oracle identifies the one ranked higher with respect to the underlying total ordering. A learner observes a dataset of such comparisons and wishes to regress sample ranks from their features. We show that to learn the model parameters with $\epsilon > 0$ accuracy, it suffices to conduct $M \in \Omega(dN\log^3 N/\epsilon^2)$ comparisons uniformly at random when $N$ is $\Omega(d/\epsilon^2)$.

Via

Access Paper or Ask Questions

Machine Learning on Camera Images for Fast mmWave Beamforming

Feb 15, 2021

Batool Salehi, Mauro Belgiovine, Sara Garcia Sanchez, Jennifer Dy, Stratis Ioannidis, Kaushik Chowdhury

Figure 1 for Machine Learning on Camera Images for Fast mmWave Beamforming

Figure 2 for Machine Learning on Camera Images for Fast mmWave Beamforming

Figure 3 for Machine Learning on Camera Images for Fast mmWave Beamforming

Figure 4 for Machine Learning on Camera Images for Fast mmWave Beamforming

Abstract:Perfect alignment in chosen beam sectors at both transmit- and receive-nodes is required for beamforming in mmWave bands. Current 802.11ad WiFi and emerging 5G cellular standards spend up to several milliseconds exploring different sector combinations to identify the beam pair with the highest SNR. In this paper, we propose a machine learning (ML) approach with two sequential convolutional neural networks (CNN) that uses out-of-band information, in the form of camera images, to (i) rapidly identify the locations of the transmitter and receiver nodes, and then (ii) return the optimal beam pair. We experimentally validate this intriguing concept for indoor settings using the NI 60GHz mmwave transceiver. Our results reveal that our ML approach reduces beamforming related exploration time by 93% under different ambient lighting conditions, with an error of less than 1% compared to the time-intensive deterministic method defined by the current standards.

Via

Access Paper or Ask Questions

Open-World Class Discovery with Kernel Networks

Dec 13, 2020

Zifeng Wang, Batool Salehi, Andrey Gritsenko, Kaushik Chowdhury, Stratis Ioannidis, Jennifer Dy

Figure 1 for Open-World Class Discovery with Kernel Networks

Figure 2 for Open-World Class Discovery with Kernel Networks

Figure 3 for Open-World Class Discovery with Kernel Networks

Figure 4 for Open-World Class Discovery with Kernel Networks

Abstract:We study an Open-World Class Discovery problem in which, given labeled training samples from old classes, we need to discover new classes from unlabeled test samples. There are two critical challenges to addressing this paradigm: (a) transferring knowledge from old to new classes, and (b) incorporating knowledge learned from new classes back to the original model. We propose Class Discovery Kernel Network with Expansion (CD-KNet-Exp), a deep learning framework, which utilizes the Hilbert Schmidt Independence Criterion to bridge supervised and unsupervised information together in a systematic way, such that the learned knowledge from old classes is distilled appropriately for discovering new classes. Compared to competing methods, CD-KNet-Exp shows superior performance on three publicly available benchmark datasets and a challenging real-world radio frequency fingerprinting dataset.

* Accepted to the IEEE International Conference on Data Mining 2020 (ICDM'20); Best paper candidate

Via

Access Paper or Ask Questions

Learn-Prune-Share for Lifelong Learning

Dec 13, 2020

Zifeng Wang, Tong Jian, Kaushik Chowdhury, Yanzhi Wang, Jennifer Dy, Stratis Ioannidis

Figure 1 for Learn-Prune-Share for Lifelong Learning

Figure 2 for Learn-Prune-Share for Lifelong Learning

Figure 3 for Learn-Prune-Share for Lifelong Learning

Figure 4 for Learn-Prune-Share for Lifelong Learning

Abstract:In lifelong learning, we wish to maintain and update a model (e.g., a neural network classifier) in the presence of new classification tasks that arrive sequentially. In this paper, we propose a learn-prune-share (LPS) algorithm which addresses the challenges of catastrophic forgetting, parsimony, and knowledge reuse simultaneously. LPS splits the network into task-specific partitions via an ADMM-based pruning strategy. This leads to no forgetting, while maintaining parsimony. Moreover, LPS integrates a novel selective knowledge sharing scheme into this ADMM optimization framework. This enables adaptive knowledge sharing in an end-to-end fashion. Comprehensive experimental results on two lifelong learning benchmark datasets and a challenging real-world radio frequency fingerprinting dataset are provided to demonstrate the effectiveness of our approach. Our experiments show that LPS consistently outperforms multiple state-of-the-art competitors.

* Accepted to the IEEE International Conference on Data Mining 2020 (ICDM'20)

Via

Access Paper or Ask Questions

Kernel Dependence Network

Nov 09, 2020

Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

Abstract:We propose a greedy strategy to spectrally train a deep network for multi-class classification. Each layer is defined as a composition of linear weights with the feature map of a Gaussian kernel acting as the activation function. At each layer, the linear weights are learned by maximizing the dependence between the layer output and the labels using the Hilbert Schmidt Independence Criterion (HSIC). By constraining the solution space on the Stiefel Manifold, we demonstrate how our network construct (Kernel Dependence Network or KNet) can be solved spectrally while leveraging the eigenvalues to automatically find the width and the depth of the network. We theoretically guarantee the existence of a solution for the global optimum while providing insight into our network's ability to generalize.

* NeurIPS2020 Workshop (Beyond Backprop)
* arXiv admin note: substantial text overlap with arXiv:2006.08539

Via

Access Paper or Ask Questions