Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sungmin Cha

Is Continual Learning Truly Learning Representations Continually?

Jun 16, 2022

Sungmin Cha, Dongsub Shim, Hyunwoo Kim, Moontae Lee, Honglak Lee, Taesup Moon

Figure 1 for Is Continual Learning Truly Learning Representations Continually?

Figure 2 for Is Continual Learning Truly Learning Representations Continually?

Figure 3 for Is Continual Learning Truly Learning Representations Continually?

Figure 4 for Is Continual Learning Truly Learning Representations Continually?

Abstract:Continual learning (CL) aims to learn from sequentially arriving tasks without forgetting previous tasks. Whereas CL algorithms have tried to achieve higher average test accuracy across all the tasks learned so far, learning continuously useful representations is critical for successful generalization and downstream transfer. To measure representational quality, we re-train only the output layers using a small balanced dataset for all the tasks, evaluating the average accuracy without any biased predictions toward the current task. We also test on several downstream tasks, measuring transfer learning accuracy of the learned representations. By testing our new formalism on ImageNet-100 and ImageNet-1000, we find that using more exemplar memory is the only option to make a meaningful difference in learned representations, and most of the regularization- or distillation-based CL algorithms that use the exemplar memory fail to learn continuously useful representations in class-incremental learning. Surprisingly, unsupervised (or self-supervised) CL with sufficient memory size can achieve comparable performance to the supervised counterparts. Considering non-trivial labeling costs, we claim that finding more efficient unsupervised CL algorithms that minimally use exemplary memory would be the next promising direction for CL research.

* Preprint

Via

Access Paper or Ask Questions

Task-Balanced Batch Normalization for Exemplar-based Class-Incremental Learning

Jan 29, 2022

Sungmin Cha, Soonwon Hong, Moontae Lee, Taesup Moon

Abstract:Batch Normalization (BN) is an essential layer for training neural network models in various computer vision tasks. It has been widely used in continual learning scenarios with little discussion, but we find that BN should be carefully applied, particularly for the exemplar memory based class incremental learning (CIL). We first analyze that the empirical mean and variance obtained for normalization in a BN layer become highly biased toward the current task. To tackle its significant problems in training and test phases, we propose Task-Balanced Batch Normalization (TBBN). Given each mini-batch imbalanced between the current and previous tasks, TBBN first reshapes and repeats the batch, calculating near task-balanced mean and variance. Second, we show that when the affine transformation parameters of BN are learned from a reshaped feature map, they become less-biased toward the current task. Based on our extensive CIL experiments with CIFAR-100 and ImageNet-100 datasets, we demonstrate that our TBBN is easily applicable to most of existing exemplar-based CIL algorithms, improving their performance by decreasing the forgetting on the previous tasks.

* Preprint

Via

Access Paper or Ask Questions

Supervised Neural Discrete Universal Denoiser for Adaptive Denoising

Nov 24, 2021

Sungmin Cha, Seonwoo Min, Sungroh Yoon, Taesup Moon

Figure 1 for Supervised Neural Discrete Universal Denoiser for Adaptive Denoising

Figure 2 for Supervised Neural Discrete Universal Denoiser for Adaptive Denoising

Figure 3 for Supervised Neural Discrete Universal Denoiser for Adaptive Denoising

Figure 4 for Supervised Neural Discrete Universal Denoiser for Adaptive Denoising

Abstract:We improve the recently developed Neural DUDE, a neural network-based adaptive discrete denoiser, by combining it with the supervised learning framework. Namely, we make the supervised pre-training of Neural DUDE compatible with the adaptive fine-tuning of the parameters based on the given noisy data subject to denoising. As a result, we achieve a significant denoising performance boost compared to the vanilla Neural DUDE, which only carries out the adaptive fine-tuning step with randomly initialized parameters. Moreover, we show the adaptive fine-tuning makes the algorithm robust such that a noise-mismatched or blindly trained supervised model can still achieve the performance of that of the matched model. Furthermore, we make a few algorithmic advancements to make Neural DUDE more scalable and deal with multi-dimensional data or data with larger alphabet size. We systematically show our improvements on two very diverse datasets, binary images and DNA sequences.

* Preprint

Via

Access Paper or Ask Questions

Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Oct 08, 2021

Joonhyun Jeong, Sungmin Cha, Youngjoon Yoo, Sangdoo Yun, Taesup Moon, Jongwon Choi

Figure 1 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Figure 2 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Figure 3 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Figure 4 for Observations on K-image Expansion of Image-Mixing Augmentation for Classification

Abstract:Image-mixing augmentations (e.g., Mixup or CutMix), which typically mix two images, have become de-facto training tricks for image classification. Despite their huge success on image classification, the number of images to mix has not been profoundly investigated by the previous works, only showing the naive K-image expansion leads to poor performance degradation. This paper derives a new K-image mixing augmentation based on the stick-breaking process under Dirichlet prior. We show that our method can train more robust and generalized classifiers through extensive experiments and analysis on classification accuracy, a shape of a loss landscape and adversarial robustness, than the usual two-image methods. Furthermore, we show that our probabilistic model can measure the sample-wise uncertainty and can boost the efficiency for Network Architecture Search (NAS) with 7x reduced search time.

* Preprint

Via

Access Paper or Ask Questions

SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Jul 01, 2021

Sungmin Cha, Beomyoung Kim, Youngjoon Yoo, Taesup Moon

Figure 1 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Figure 2 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Figure 3 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Figure 4 for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Abstract:We consider a class-incremental semantic segmentation (CISS) problem. While some recently proposed algorithms utilized variants of knowledge distillation (KD) technique to tackle the problem, they only partially addressed the key additional challenges in CISS that causes the catastrophic forgetting; i.e., the semantic drift of the background class and multi-label prediction issue. To better address these challenges, we propose a new method, dubbed as SSUL-M (Semantic Segmentation with Unknown Label with Memory), by carefully combining several techniques tailored for semantic segmentation. More specifically, we make three main contributions; (1) modeling unknown class within the background class to help learning future classes (help plasticity), (2) freezing backbone network and past classifiers with binary cross-entropy loss and pseudo-labeling to overcome catastrophic forgetting (help stability), and (3) utilizing tiny exemplar memory for the first time in CISS to improve both plasticity and stability. As a result, we show our method achieves significantly better performance than the recent state-of-the-art baselines on the standard benchmark datasets. Furthermore, we justify our contributions with thorough and extensive ablation analyses and discuss different natures of the CISS problem compared to the standard class-incremental learning for classification.

Via

Access Paper or Ask Questions

Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Jun 22, 2021

Sungmin Cha, Naeun Ko, Youngjoon Yoo, Taesup Moon

Figure 1 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Figure 2 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Figure 3 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Figure 4 for Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack

Abstract:We propose a novel and effective input transformation based adversarial defense method against gray- and black-box attack, which is computationally efficient and does not require any adversarial training or retraining of a classification model. We first show that a very simple iterative Gaussian smoothing can effectively wash out adversarial noise and achieve substantially high robust accuracy. Based on the observation, we propose Self-Supervised Iterative Contextual Smoothing (SSICS), which aims to reconstruct the original discriminative features from the Gaussian-smoothed image in context-adaptive manner, while still smoothing out the adversarial noise. From the experiments on ImageNet, we show that our SSICS achieves both high standard accuracy and very competitive robust accuracy for the gray- and black-box attacks; e.g., transfer-based PGD-attack and score-based attack. A note-worthy point to stress is that our defense is free of computationally expensive adversarial training, yet, can approach its robust accuracy via input transformation.

* Preprint version

Via

Access Paper or Ask Questions

FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

May 23, 2021

Jaeseok Byun, Sungmin Cha, Taesup Moon

Figure 1 for FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

Figure 2 for FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

Figure 3 for FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

Figure 4 for FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

Abstract:We consider the challenging blind denoising problem for Poisson-Gaussian noise, in which no additional information about clean images or noise level parameters is available. Particularly, when only "single" noisy images are available for training a denoiser, the denoising performance of existing methods was not satisfactory. Recently, the blind pixelwise affine image denoiser (BP-AIDE) was proposed and significantly improved the performance in the above setting, to the extent that it is competitive with denoisers which utilized additional information. However, BP-AIDE seriously suffered from slow inference time due to the inefficiency of noise level estimation procedure and that of the blind-spot network (BSN) architecture it used. To that end, we propose Fast Blind Image Denoiser (FBI-Denoiser) for Poisson-Gaussian noise, which consists of two neural network models; 1) PGE-Net that estimates Poisson-Gaussian noise parameters 2000 times faster than the conventional methods and 2) FBI-Net that realizes a much more efficient BSN for pixelwise affine denoiser in terms of the number of parameters and inference speed. Consequently, we show that our FBI-Denoiser blindly trained solely based on single noisy images can achieve the state-of-the-art performance on several real-world noisy image benchmark datasets with much faster inference time (x 10), compared to BP-AIDE. The official code of our method is available at https://github.com/csm9493/FBI-Denoiser.

* CVPR 2021 camera ready version

Via

Access Paper or Ask Questions

CPR: Classifier-Projection Regularization for Continual Learning

Jun 12, 2020

Sungmin Cha, Hsiang Hsu, Flavio P. Calmon, Taesup Moon

Figure 1 for CPR: Classifier-Projection Regularization for Continual Learning

Figure 2 for CPR: Classifier-Projection Regularization for Continual Learning

Figure 3 for CPR: Classifier-Projection Regularization for Continual Learning

Figure 4 for CPR: Classifier-Projection Regularization for Continual Learning

Abstract:We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this additional term can be interpreted as a projection of the conditional probability given by a classifier's output to the uniform distribution. By applying the Pythagorean theorem for KL divergence, we then prove that this projection may (in theory) improve the performance of continual learning methods. In our extensive experimental results, we apply CPR to several state-of-the-art regularization-based continual learning methods and benchmark performance on popular image recognition datasets. Our results demonstrate that CPR indeed promotes a wide local minima and significantly improves both accuracy and plasticity while simultaneously mitigating the catastrophic forgetting of baseline continual learning methods.

Via

Access Paper or Ask Questions

Adaptive Group Sparse Regularization for Continual Learning

Mar 30, 2020

Sangwon Jung, Hongjoon Ahn, Sungmin Cha, Taesup Moon

Figure 1 for Adaptive Group Sparse Regularization for Continual Learning

Figure 2 for Adaptive Group Sparse Regularization for Continual Learning

Figure 3 for Adaptive Group Sparse Regularization for Continual Learning

Figure 4 for Adaptive Group Sparse Regularization for Continual Learning

Abstract:We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL), using two group sparsity-based penalties. Our method selectively employs the two penalties when learning each node based its the importance, which is adaptively updated after learning each new task. By utilizing the proximal gradient descent method for learning, the exact sparsity and freezing of the model is guaranteed, and thus, the learner can explicitly control the model capacity as the learning continues. Furthermore, as a critical detail, we re-initialize the weights associated with unimportant nodes after learning each task in order to prevent the negative transfer that causes the catastrophic forgetting and facilitate efficient learning of new tasks. Throughout the extensive experimental results, we show that our AGS-CL uses much less additional memory space for storing the regularization parameters, and it significantly outperforms several state-of-the-art baselines on representative continual learning benchmarks for both supervised and reinforcement learning tasks.

Via

Access Paper or Ask Questions

Uncertainty-based Continual Learning with Adaptive Regularization

May 28, 2019

Hongjoon Ahn, Donggyu Lee, Sungmin Cha, Taesup Moon

Figure 1 for Uncertainty-based Continual Learning with Adaptive Regularization

Figure 2 for Uncertainty-based Continual Learning with Adaptive Regularization

Figure 3 for Uncertainty-based Continual Learning with Adaptive Regularization

Figure 4 for Uncertainty-based Continual Learning with Adaptive Regularization

Abstract:We introduce a new regularization-based continual learning algorithm, dubbed as Uncertainty-regularized Continual Learning (UCL), that stores much smaller number of additional parameters for regularization terms than the recent state-of-the-art methods. Our approach builds upon the Bayesian learning framework, but makes a fresh interpretation of the variational approximation based regularization term and defines a notion of "uncertainty" for each hidden node in the network. The regularization parameter of each weight is then set to be large when the uncertainty of either of the node that the weight connects is small, since the weights connected to an important node should be less updated when a new task comes. Moreover, we add two additional regularization terms; one that promotes freezing the weights that are identified to be important (i.e., certain) for past tasks, and the other that gives flexibility to control the actively learning parameters for a new task by gracefully forgetting what was learned before. In results, we show our UCL outperforms most of recent state-of-the-art baselines on both supervised learning and reinforcement learning benchmarks.

Via

Access Paper or Ask Questions