Alert button
Picture for Huiping Zhuang

Huiping Zhuang

Alert button

Logit Clipping for Robust Learning against Label Noise

Dec 08, 2022
Hongxin Wei, Huiping Zhuang, Renchunzi Xie, Lei Feng, Gang Niu, Bo An, Yixuan Li

Figure 1 for Logit Clipping for Robust Learning against Label Noise
Figure 2 for Logit Clipping for Robust Learning against Label Noise
Figure 3 for Logit Clipping for Robust Learning against Label Noise
Figure 4 for Logit Clipping for Robust Learning against Label Noise

In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks. Cross Entropy (CE) loss has been shown to be not robust to noisy labels due to its unboundedness. To alleviate this issue, existing works typically design specialized robust losses with the symmetric condition, which usually lead to the underfitting issue. In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. Specifically, we propose logit clipping (LogitClip), which clamps the norm of the logit vector to ensure that it is upper bounded by a constant. In this manner, CE loss equipped with our LogitClip method is effectively bounded, mitigating the overfitting to examples with noisy labels. Moreover, we present theoretical analyses to certify the noise-tolerant ability of LogitClip. Extensive experiments show that LogitClip not only significantly improves the noise robustness of CE loss, but also broadly enhances the generalization performance of popular robust losses.

Viaarxiv icon

ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

May 30, 2022
Huiping Zhuang, Zhenyu Weng, Renchunzi Xie, Kar-Ann Toh, Zhiping Lin

Figure 1 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
Figure 2 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
Figure 3 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
Figure 4 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by linear learning formulations, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past knowledge while avoiding breaching of data privacy (i.e., without storing historical data). The absolute memorization is demonstrated in the sense that class-incremental learning using ACIL given present data would give identical results to that from its joint-learning counterpart which consumes both present and historical samples. This equality is theoretically validated. Data privacy is ensured since no historical data are involved during the learning process. Empirical validations demonstrate ACIL's competitive accuracy performance with near-identical results for various incremental task settings (e.g., 5-50 phases). This also allows ACIL to outperform the state-of-the-art methods for large-phase scenarios (e.g., 25 and 50 phases).

Viaarxiv icon

Analytic Learning of Convolutional Neural Network For Pattern Recognition

Feb 14, 2022
Huiping Zhuang, Zhiping Lin, Yimin Yang, Kar-Ann Toh

Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive particularly in view of the need to visit the dataset multiple times. In contrast, analytic learning attempts to obtain the weights in one epoch. However, existing attempts to analytic learning considered only the multilayer perceptron (MLP). In this article, we propose an analytic convolutional neural network learning (ACnnL). Theoretically we show that ACnnL builds a closed-form solution similar to its MLP counterpart, but differs in their regularization constraints. Consequently, we are able to answer to a certain extent why CNNs usually generalize better than MLPs from the implicit regularization point of view. The ACnnL is validated by conducting classification tasks on several benchmark datasets. It is encouraging that the ACnnL trains CNNs in a significantly fast manner with reasonably close prediction accuracies to those using BP. Moreover, our experiments disclose a unique advantage of ACnnL under the small-sample scenario when training data are scarce or expensive.

Viaarxiv icon

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Dec 03, 2020
Huiping Zhuang, Zhiping Lin, Kar-Ann Toh

Figure 1 for Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization
Figure 2 for Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization
Figure 3 for Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization
Figure 4 for Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes the training of a network by splitting it depth-wise into multiple modules. Techniques from decoupled learning usually lead to stale gradient effect because of their asynchronous implementation, thereby causing performance degradation. In this paper, we propose an accumulated decoupled learning (ADL) which incorporates the gradient accumulation technique to mitigate the stale gradient effect. We give both theoretical and empirical evidences regarding how the gradient staleness can be reduced. We prove that the proposed method can converge to critical points, i.e., the gradients converge to 0, in spite of its asynchronous nature. Empirical validation is provided by training deep convolutional neural networks to perform classification tasks on CIFAR-10 and ImageNet datasets. The ADL is shown to outperform several state-of-the-arts in the classification tasks, and is the fastest among the compared methods.

Viaarxiv icon

Fully Decoupled Neural Network Learning Using Delayed Gradients

Jun 21, 2019
Huiping Zhuang, Yi Wang, Qinglai Liu, Zhiping Lin

Figure 1 for Fully Decoupled Neural Network Learning Using Delayed Gradients
Figure 2 for Fully Decoupled Neural Network Learning Using Delayed Gradients
Figure 3 for Fully Decoupled Neural Network Learning Using Delayed Gradients
Figure 4 for Fully Decoupled Neural Network Learning Using Delayed Gradients

Using the back-propagation (BP) to train neural networks requires a sequential passing of the activations and the gradients, which forces the network modules to work in a synchronous fashion. This has been recognized as the lockings (i.e., the forward, backward and update lockings) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The proposed method splits a neural network into multiple modules that are trained independently and asynchronously in different GPUs. We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. In addition, we prove that the proposed FDG algorithm guarantees a statistical convergence during training. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on benchmark datasets. The proposed FDG is able to train very deep networks (>100 layers) and very large networks (>35 million parameters) with significant speed gains while outperforming the state-of-the-art methods and the standard BP.

Viaarxiv icon