Alert button
Picture for Jian Cao

Jian Cao

Alert button

GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning

Aug 20, 2023
Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Jian Cao, Haibing Guan

Federated Learning (FL) is popular for its privacy-preserving and collaborative learning capabilities. Recently, personalized FL (pFL) has received attention for its ability to address statistical heterogeneity and achieve personalization in FL. However, from the perspective of feature extraction, most existing pFL methods only focus on extracting global or personalized feature information during local training, which fails to meet the collaborative learning and personalization goals of pFL. To address this, we propose a new pFL method, named GPFL, to simultaneously learn global and personalized feature information on each client. We conduct extensive experiments on six datasets in three statistically heterogeneous settings and show the superiority of GPFL over ten state-of-the-art methods regarding effectiveness, scalability, fairness, stability, and privacy. Besides, GPFL mitigates overfitting and outperforms the baselines by up to 8.99% in accuracy.

* Accepted by ICCV2023 
Viaarxiv icon

Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings

Jun 30, 2023
Yuan Zhang, Jian Cao, Ling Zhang, Jue Chen, Wenyu Sun, Yuan Wang

Figure 1 for Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Figure 2 for Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Figure 3 for Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Figure 4 for Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings

The event streams generated by dynamic vision sensors (DVS) are sparse and non-uniform in the spatial domain, while still dense and redundant in the temporal domain. Although spiking neural network (SNN), the event-driven neuromorphic model, has the potential to extract spatio-temporal features from the event streams, it is not effective and efficient. Based on the above, we propose an events sparsification spiking framework dubbed as Razor SNN, pruning pointless event frames progressively. Concretely, we extend the dynamic mechanism based on the global temporal embeddings, reconstruct the features, and emphasize the events effect adaptively at the training stage. During the inference stage, eliminate fruitless frames hierarchically according to a binary mask generated by the trained temporal embeddings. Comprehensive experiments demonstrate that our Razor SNN achieves competitive performance consistently on four events-based benchmarks: DVS 128 Gesture, N-Caltech 101, CIFAR10-DVS and SHD.

* Accepted by ICANN 2023 (Oral) 
Viaarxiv icon

Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

May 04, 2023
Yuan Zhang, Weihua Chen, Yichen Lu, Tao Huang, Xiuyu Sun, Jian Cao

Figure 1 for Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty
Figure 2 for Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty
Figure 3 for Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty
Figure 4 for Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

Knowledge distillation is an effective paradigm for boosting the performance of pocket-size model, especially when multiple teacher models are available, the student would break the upper limit again. However, it is not economical to train diverse teacher models for the disposable distillation. In this paper, we introduce a new concept dubbed Avatars for distillation, which are the inference ensemble models derived from the teacher. Concretely, (1) For each iteration of distillation training, various Avatars are generated by a perturbation transformation. We validate that Avatars own higher upper limit of working capacity and teaching ability, aiding the student model in learning diverse and receptive knowledge perspectives from the teacher model. (2) During the distillation, we propose an uncertainty-aware factor from the variance of statistical differences between the vanilla teacher and Avatars, to adjust Avatars' contribution on knowledge transfer adaptively. Avatar Knowledge Distillation AKD is fundamentally different from existing methods and refines with the innovative view of unequal training. Comprehensive experiments demonstrate the effectiveness of our Avatars mechanism, which polishes up the state-of-the-art distillation methods for dense prediction without more extra computational cost. The AKD brings at most 0.7 AP gains on COCO 2017 for Object Detection and 1.83 mIoU gains on Cityscapes for Semantic Segmentation, respectively.

* 8 Pages 
Viaarxiv icon

Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding

Apr 14, 2023
Chunyan Xiong, Mengli Lu, Xiaotong Yu, Jian Cao, Zhong Chen, Di Guo, Xiaobo Qu

Figure 1 for Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Figure 2 for Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Figure 3 for Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Figure 4 for Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding

Soft-thresholding has been widely used in neural networks. Its basic network structure is a two-layer convolution neural network with soft-thresholding. Due to the network's nature of nonlinearity and nonconvexity, the training process heavily depends on an appropriate initialization of network parameters, resulting in the difficulty of obtaining a globally optimal solution. To address this issue, a convex dual network is designed here. We theoretically analyze the network convexity and numerically confirm that the strong duality holds. This conclusion is further verified in the linear fitting and denoising experiments. This work provides a new way to convexify soft-thresholding neural networks.

* 13 pages,10 figures 
Viaarxiv icon

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Jan 31, 2023
Xin Dong, Ruize Wu, Chao Xiong, Hai Li, Lei Cheng, Yong He, Shiyou Qian, Jian Cao, Linjian Mo

Figure 1 for GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning
Figure 2 for GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning
Figure 3 for GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning
Figure 4 for GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics.

* Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022: 386-395  
Viaarxiv icon

Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

Jan 30, 2023
Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Schafer, Matthias Katzfuss

Figure 1 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization
Figure 2 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization
Figure 3 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization
Figure 4 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on a particular SIC ordering and nearest-neighbor-based sparsity pattern resulting in highly accurate prior and posterior approximations. For this setting, our variational approximation can be computed via stochastic gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate than alternative approaches such as inducing-point and mean-field approximations at similar computational complexity.

Viaarxiv icon

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Dec 29, 2022
Jian Cao, Chen Qian, Yihui Huang, Dicheng Chen, Yuncheng Gao, Jiyang Dong, Di Guo, Xiaobo Qu

Figure 1 for A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
Figure 2 for A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
Figure 3 for A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
Figure 4 for A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

* 15 pages, 8 figures 
Viaarxiv icon

RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning

Dec 13, 2022
Boxuan Zhao, Jun Zhang, Deheng Ye, Jian Cao, Xiao Han, Qiang Fu, Wei Yang

Figure 1 for RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning
Figure 2 for RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning
Figure 3 for RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning
Figure 4 for RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning

Whole-slide images (WSI) in computational pathology have high resolution with gigapixel size, but are generally with sparse regions of interest, which leads to weak diagnostic relevance and data inefficiency for each area in the slide. Most of the existing methods rely on a multiple instance learning framework that requires densely sampling local patches at high magnification. The limitation is evident in the application stage as the heavy computation for extracting patch-level features is inevitable. In this paper, we develop RLogist, a benchmarking deep reinforcement learning (DRL) method for fast observation strategy on WSIs. Imitating the diagnostic logic of human pathologists, our RL agent learns how to find regions of observation value and obtain representative features across multiple resolution levels, without having to analyze each part of the WSI at the high magnification. We benchmark our method on two whole-slide level classification tasks, including detection of metastases in WSIs of lymph node sections, and subtyping of lung cancer. Experimental results demonstrate that RLogist achieves competitive classification performance compared to typical multiple instance learning algorithms, while having a significantly short observation path. In addition, the observation path given by RLogist provides good decision-making interpretability, and its ability of reading path navigation can potentially be used by pathologists for educational/assistive purposes. Our code is available at: \url{https://github.com/tencent-ailab/RLogist}.

* accepted by AAAI 2023 
Viaarxiv icon

Alternating Deep Low Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications

Nov 24, 2022
Yihui Huang, Zi Wang, Xinlin Zhang, Jian Cao, Zhangren Tu, Di Guo, Xiaobo Qu

Figure 1 for Alternating Deep Low Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications
Figure 2 for Alternating Deep Low Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications
Figure 3 for Alternating Deep Low Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications
Figure 4 for Alternating Deep Low Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications

Exponential function is a fundamental signal form in general signal processing and biomedical applications, such as magnetic resonance spectroscopy and imaging. How to reduce the sampling time of these signals is an important problem. Sub-Nyquist sampling can accelerate signal acquisition but bring in artifacts. Recently, the low rankness of these exponentials has been applied to implicitly constrain the deep learning network through the unrolling of low rank Hankel factorization algorithm. However, only depending on the implicit low rank constraint cannot provide the robust reconstruction, such as sampling rate mismatches. In this work, by introducing the explicit low rank prior to constrain the deep learning, we propose an Alternating Deep Low Rank approach (ADLR) that utilizes deep learning and optimization solvers alternately. The former solver accelerates the reconstruction while the latter one corrects the reconstruction error from the mismatch. The experiments on both general exponential functions and realistic biomedical magnetic resonance data show that, compared with the state-of-the-art methods, ADLR can achieve much lower reconstruction error and effectively alleviates the decrease of reconstruction quality with sampling rate mismatches.

* 14 pages 
Viaarxiv icon