Alert button
Picture for Hui Lin

Hui Lin

Alert button

Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data

Aug 07, 2023
Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, Xiangxu Meng

Federated Learning aims to learn a global model on the server side that generalizes to all clients in a privacy-preserving manner, by leveraging the local models from different clients. Existing solutions focus on either regularizing the objective functions among clients or improving the aggregation mechanism for the improved model generalization capability. However, their performance is typically limited by the dataset biases, such as the heterogeneous data distributions and the missing classes. To address this issue, this paper presents a cross-silo prototypical calibration method (FedCSPC), which takes additional prototype information from the clients to learn a unified feature space on the server side. Specifically, FedCSPC first employs the Data Prototypical Modeling (DPM) module to learn data patterns via clustering to aid calibration. Subsequently, the cross-silo prototypical calibration (CSPC) module develops an augmented contrastive learning method to improve the robustness of the calibration, which can effectively project cross-source features into a consistent space while maintaining clear decision boundaries. Moreover, the CSPC module's ease of implementation and plug-and-play characteristics make it even more remarkable. Experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study, and the results verified that FedCSPC is capable of learning the consistent features across different data sources of the same class under the guidance of calibrated model, which leads to better performance than the state-of-the-art methods. The source codes have been released at https://github.com/qizhuang-qz/FedCSPC.

Viaarxiv icon

LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

Aug 01, 2023
Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen

Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks. Although early exiting is a feasible solution for accelerating inference, most works focus on convolutional neural networks (CNNs) and transformer models in natural language processing (NLP).Moreover, the direct application of early exiting methods to ViTs may result in substantial performance degradation. To tackle this challenge, we systematically investigate the efficacy of early exiting in ViTs and point out that the insufficient feature representations in shallow internal classifiers and the limited ability to capture target semantic information in deep internal classifiers restrict the performance of these methods. We then propose an early exiting framework for general ViTs termed LGViT, which incorporates heterogeneous exiting heads, namely, local perception head and global aggregation head, to achieve an efficiency-accuracy trade-off. In particular, we develop a novel two-stage training scheme, including end-to-end training and self-distillation with the backbone frozen to generate early exiting ViTs, which facilitates the fusion of global and local information extracted by the two types of heads. We conduct extensive experiments using three popular ViT backbones on three vision datasets. Results demonstrate that our LGViT can achieve competitive performance with approximately 1.8 $\times$ speed-up.

* ACM MM 2023 
Viaarxiv icon

Metric Learning-Based Timing Synchronization by Using Lightweight Neural Network

Jul 01, 2023
Chaojin Qing, Na Yang, Shuhai Tang, Chuangui Rao, Jiafan Wang, Hui Lin

Figure 1 for Metric Learning-Based Timing Synchronization by Using Lightweight Neural Network
Figure 2 for Metric Learning-Based Timing Synchronization by Using Lightweight Neural Network
Figure 3 for Metric Learning-Based Timing Synchronization by Using Lightweight Neural Network
Figure 4 for Metric Learning-Based Timing Synchronization by Using Lightweight Neural Network

Timing synchronization (TS) is one of the key tasks in orthogonal frequency division multiplexing (OFDM) systems. However, multi-path uncertainty corrupts the TS correctness, making OFDM systems suffer from a severe inter-symbol-interference (ISI). To tackle this issue, we propose a timing-metric learning-based TS method assisted by a lightweight one-dimensional convolutional neural network (1-D CNN). Specifically, the receptive field of 1-D CNN is specifically designed to extract the metric features from the classic synchronizer. Then, to combat the multi-path uncertainty, we employ the varying delays and gains of multi-path (the characteristics of multi-path uncertainty) to design the timing-metric objective, and thus form the training labels. This is typically different from the existing timing-metric objectives with respect to the timing synchronization point. Our method substantively increases the completeness of training data against the multi-path uncertainty due to the complete preservation of metric information. By this mean, the TS correctness is improved against the multi-path uncertainty. Numerical results demonstrate the effectiveness and generalization of the proposed TS method against the multi-path uncertainty.

* 4 pages, 3 figures 
Viaarxiv icon

Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention

Nov 11, 2022
Hao Zheng, Hui Lin, Rong Zhao, Luping Shi

Figure 1 for Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention
Figure 2 for Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention
Figure 3 for Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention
Figure 4 for Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention

The binding problem is one of the fundamental challenges that prevent the artificial neural network (ANNs) from a compositional understanding of the world like human perception, because disentangled and distributed representations of generative factors can interfere and lead to ambiguity when complex data with multiple objects are presented. In this paper, we propose a brain-inspired hybrid neural network (HNN) that introduces temporal binding theory originated from neuroscience into ANNs by integrating spike timing dynamics (via spiking neural networks, SNNs) with reconstructive attention (by ANNs). Spike timing provides an additional dimension for grouping, while reconstructive feedback coordinates the spikes into temporal coherent states. Through iterative interaction of ANN and SNN, the model continuously binds multiple objects at alternative synchronous firing times in the SNN coding space. The effectiveness of the model is evaluated on synthetic datasets of binary images. By visualization and analysis, we demonstrate that the binding is explainable, soft, flexible, and hierarchical. Notably, the model is trained on single object datasets without explicit supervision on grouping, but successfully binds multiple objects on test datasets, showing its compositional generalization capability. Further results show its binding ability in dynamic situations.

Viaarxiv icon

Semi-supervised Crowd Counting via Density Agency

Sep 07, 2022
Hui Lin, Zhiheng Ma, Xiaopeng Hong, Yaowei Wang, Zhou Su

Figure 1 for Semi-supervised Crowd Counting via Density Agency
Figure 2 for Semi-supervised Crowd Counting via Density Agency
Figure 3 for Semi-supervised Crowd Counting via Density Agency
Figure 4 for Semi-supervised Crowd Counting via Density Agency

In this paper, we propose a new agency-guided semi-supervised counting approach. First, we build a learnable auxiliary structure, namely the density agency to bring the recognized foreground regional features close to corresponding density sub-classes (agents) and push away background ones. Second, we propose a density-guided contrastive learning loss to consolidate the backbone feature extractor. Third, we build a regression head by using a transformer structure to refine the foreground features further. Finally, an efficient noise depression loss is provided to minimize the negative influence of annotation noises. Extensive experiments on four challenging crowd counting datasets demonstrate that our method achieves superior performance to the state-of-the-art semi-supervised counting methods by a large margin. Code is available.

* This is the accepted version of the Paper & Supp to appear in ACM MM 2022. Please cite the final published version. Code is available at https://github.com/LoraLinH/Semi-supervised-Crowd-Counting-via-Density-Agency 
Viaarxiv icon

On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

May 21, 2022
Yongjie Wang, Chuan Wang, Ruobing Li, Hui Lin

Figure 1 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation
Figure 2 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation
Figure 3 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation
Figure 4 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

In recent years, pre-trained models have become dominant in most natural language processing (NLP) tasks. However, in the area of Automated Essay Scoring (AES), pre-trained models such as BERT have not been properly used to outperform other deep learning models such as LSTM. In this paper, we introduce a novel multi-scale essay representation for BERT that can be jointly learned. We also employ multiple losses and transfer learning from out-of-domain essays to further improve the performance. Experiment results show that our approach derives much benefit from joint learning of multi-scale essay representation and obtains almost the state-of-the-art result among all deep learning models in the ASAP task. Our multi-scale essay representation also generalizes well to CommonLit Readability Prize data set, which suggests that the novel text representation proposed in this paper may be a new and effective choice for long-text tasks.

* Accepted to NAACL 2022 as a long paper 
Viaarxiv icon

Boosting Crowd Counting via Multifaceted Attention

Mar 05, 2022
Hui Lin, Zhiheng Ma, Rongrong Ji, Yaowei Wang, Xiaopeng Hong

Figure 1 for Boosting Crowd Counting via Multifaceted Attention
Figure 2 for Boosting Crowd Counting via Multifaceted Attention
Figure 3 for Boosting Crowd Counting via Multifaceted Attention
Figure 4 for Boosting Crowd Counting via Multifaceted Attention

This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variation. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. MAN incorporates global attention from a vanilla transformer, learnable local attention, and instance attention into a counting model. Firstly, the local Learnable Region Attention (LRA) is proposed to assign attention exclusively for each feature location dynamically. Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations. Finally, we provide an Instance Attention mechanism to focus on the most important instances dynamically during training. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and NWPU have validated the proposed method. Codes: https://github.com/LoraLinH/Boosting-Crowd-Counting-via-Multifaceted-Attention.

* Accepted by IEEE CVPR 2022. Codes available at: https://github.com/LoraLinH/Boosting-Crowd-Counting-via-Multifaceted-Attention 
Viaarxiv icon

Object Counting: You Only Need to Look at One

Dec 11, 2021
Hui Lin, Xiaopeng Hong, Yabin Wang

Figure 1 for Object Counting: You Only Need to Look at One
Figure 2 for Object Counting: You Only Need to Look at One
Figure 3 for Object Counting: You Only Need to Look at One
Figure 4 for Object Counting: You Only Need to Look at One

This paper aims to tackle the challenging task of one-shot object counting. Given an image containing novel, previously unseen category objects, the goal of the task is to count all instances in the desired category with only one supporting bounding box example. To this end, we propose a counting model by which you only need to Look At One instance (LaoNet). First, a feature correlation module combines the Self-Attention and Correlative-Attention modules to learn both inner-relations and inter-relations. It enables the network to be robust to the inconsistency of rotations and sizes among different instances. Second, a Scale Aggregation mechanism is designed to help extract features with different scale information. Compared with existing few-shot counting methods, LaoNet achieves state-of-the-art results while learning with a high convergence speed. The code will be available soon.

* Keywords: Crowd counting, one-shot object counting, Attention 
Viaarxiv icon

Direct Measure Matching for Crowd Counting

Jul 04, 2021
Hui Lin, Xiaopeng Hong, Zhiheng Ma, Xing Wei, Yunfeng Qiu, Yaowei Wang, Yihong Gong

Figure 1 for Direct Measure Matching for Crowd Counting
Figure 2 for Direct Measure Matching for Crowd Counting
Figure 3 for Direct Measure Matching for Crowd Counting
Figure 4 for Direct Measure Matching for Crowd Counting

Traditional crowd counting approaches usually use Gaussian assumption to generate pseudo density ground truth, which suffers from problems like inaccurate estimation of the Gaussian kernel sizes. In this paper, we propose a new measure-based counting approach to regress the predicted density maps to the scattered point-annotated ground truth directly. First, crowd counting is formulated as a measure matching problem. Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching. Third, we propose a self-supervised mechanism by devising a Sinkhorn scale consistency loss to resist scale changes. Finally, an efficient optimization method is provided to minimize the overall loss function. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and NWPU have validated the proposed method.

* Accepted by International Joint Conference on Artificial Intelligence (IJCAI2021) 
Viaarxiv icon