Alert button
Picture for Yang Luo

Yang Luo

Alert button

One-stage Modality Distillation for Incomplete Multimodal Learning

Sep 15, 2023
Shicai Wei, Yang Luo, Chunbo Luo

Figure 1 for One-stage Modality Distillation for Incomplete Multimodal Learning
Figure 2 for One-stage Modality Distillation for Incomplete Multimodal Learning
Figure 3 for One-stage Modality Distillation for Incomplete Multimodal Learning
Figure 4 for One-stage Modality Distillation for Incomplete Multimodal Learning

Learning based on multimodal data has attracted increasing interest recently. While a variety of sensory modalities can be collected for training, not all of them are always available in development scenarios, which raises the challenge to infer with incomplete modality. To address this issue, this paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion into a single optimization procedure via multi-task learning. Compared with the conventional modality distillation that performs them independently, this helps to capture the valuable representation that can assist the final model inference directly. Specifically, we propose the joint adaptation network for the modality transfer task to preserve the privileged information. This addresses the representation heterogeneity caused by input discrepancy via the joint distribution adaptation. Then, we introduce the cross translation network for the modality fusion task to aggregate the restored and available modality features. It leverages the parameters-sharing strategy to capture the cross-modal cues explicitly. Extensive experiments on RGB-D classification and segmentation tasks demonstrate the proposed multimodal inheritance framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.

Viaarxiv icon

RGB-T Tracking via Multi-Modal Mutual Prompt Learning

Aug 31, 2023
Yang Luo, Xiqing Guo, Hui Feng, Lei Ao

Figure 1 for RGB-T Tracking via Multi-Modal Mutual Prompt Learning
Figure 2 for RGB-T Tracking via Multi-Modal Mutual Prompt Learning
Figure 3 for RGB-T Tracking via Multi-Modal Mutual Prompt Learning
Figure 4 for RGB-T Tracking via Multi-Modal Mutual Prompt Learning

Object tracking based on the fusion of visible and thermal im-ages, known as RGB-T tracking, has gained increasing atten-tion from researchers in recent years. How to achieve a more comprehensive fusion of information from the two modalities with fewer computational costs has been a problem that re-searchers have been exploring. Recently, with the rise of prompt learning in computer vision, we can better transfer knowledge from visual large models to downstream tasks. Considering the strong complementarity between visible and thermal modalities, we propose a tracking architecture based on mutual prompt learning between the two modalities. We also design a lightweight prompter that incorporates attention mechanisms in two dimensions to transfer information from one modality to the other with lower computational costs, embedding it into each layer of the backbone. Extensive ex-periments have demonstrated that our proposed tracking ar-chitecture is effective and efficient, achieving state-of-the-art performance while maintaining high running speeds.

* 9 pages, 5 figures, 5 tables 
Viaarxiv icon

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Jul 05, 2023
Yang Luo, Xiaozhe Ren, Zangwei Zheng, Zhuo Jiang, Xin Jiang, Yang You

Figure 1 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 2 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 3 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 4 for CAME: Confidence-guided Adaptive Memory Efficient Optimization

Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models. Nevertheless, the need for adaptivity requires maintaining second-moment estimates of the per-parameter gradients, which entails a high cost of extra memory overheads. To solve this problem, several memory-efficient optimizers (e.g., Adafactor) have been proposed to obtain a drastic reduction in auxiliary memory usage, but with a performance penalty. In this paper, we first study a confidence-guided strategy to reduce the instability of existing memory efficient optimizers. Based on this strategy, we propose CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods. Extensive experiments demonstrate the training stability and superior performance of CAME across various NLP tasks such as BERT and GPT-2 training. Notably, for BERT pre-training on the large batch size of 32,768, our proposed optimizer attains faster convergence and higher accuracy compared with the Adam optimizer. The implementation of CAME is publicly available.

* Accepted by ACL 2023 
Viaarxiv icon

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

May 22, 2023
Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

Figure 1 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 2 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 3 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 4 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. Our approach begins by tapping into the potential of LLMs to accurately perceive and predict the response length with minimal overhead. By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches. We evaluate our approach on real-world instruction datasets using the LLaMA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness. Notably, our method is orthogonal to other inference acceleration techniques, making it a valuable addition to many existing toolkits (e.g., FlashAttention, Quantization) for LLM inference.

Viaarxiv icon

RGB-T Tracking Based on Mixed Attention

Apr 18, 2023
Yang Luo, Xiqing Guo, Mingtao Dong, Jin Yu

Figure 1 for RGB-T Tracking Based on Mixed Attention
Figure 2 for RGB-T Tracking Based on Mixed Attention
Figure 3 for RGB-T Tracking Based on Mixed Attention
Figure 4 for RGB-T Tracking Based on Mixed Attention

RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, it constructs a robust feature representation that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality-adaptive fusion is achieved through a mixed attention-based modality fusion network, which suppresses the low-quality modality noise while enhancing the information of the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to longterm tracking scenarios.

* 14 pages, 10 figures 
Viaarxiv icon

MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning

Apr 17, 2023
Shicai Wei, Yang Luo, Chunbo Luo

Figure 1 for MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning
Figure 2 for MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning
Figure 3 for MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning
Figure 4 for MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning

Multimodal learning has shown great potentials in numerous scenes and attracts increasing interest recently. However, it often encounters the problem of missing modality data and thus suffers severe performance degradation in practice. To this end, we propose a general framework called MMANet to assist incomplete multimodal learning. It consists of three components: the deployment network used for inference, the teacher network transferring comprehensive multimodal information to the deployment network, and the regularization network guiding the deployment network to balance weak modality combinations. Specifically, we propose a novel margin-aware distillation (MAD) to assist the information transfer by weighing the sample contribution with the classification uncertainty. This encourages the deployment network to focus on the samples near decision boundaries and acquire the refined inter-class margin. Besides, we design a modality-aware regularization (MAR) algorithm to mine the weak modality combinations and guide the regularization network to calculate prediction loss for them. This forces the deployment network to improve its representation ability for the weak modality combinations adaptively. Finally, extensive experiments on multimodal classification and segmentation tasks demonstrate that our MMANet outperforms the state-of-the-art significantly. Code is available at: https://github.com/shicaiwei123/MMANet

* 10 pages, 3 figures, CVPR2023 
Viaarxiv icon

Binary Classification with Positive Labeling Sources

Aug 02, 2022
Jieyu Zhang, Yujing Wang, Yaming Yang, Yang Luo, Alexander Ratner

Figure 1 for Binary Classification with Positive Labeling Sources
Figure 2 for Binary Classification with Positive Labeling Sources

To create a large amount of training labels for machine learning models effectively and efficiently, researchers have turned to Weak Supervision (WS), which uses programmatic labeling sources rather than manual annotation. Existing works of WS for binary classification typically assume the presence of labeling sources that are able to assign both positive and negative labels to data in roughly balanced proportions. However, for many tasks of interest where there is a minority positive class, negative examples could be too diverse for developers to generate indicative labeling sources. Thus, in this work, we study the application of WS on binary classification tasks with positive labeling sources only. We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources. On 10 benchmark datasets, we show WEAPO achieves the highest averaged performance in terms of both the quality of synthesized labels and the performance of the final classifier supervised with these labels. We incorporated the implementation of \method into WRENCH, an existing benchmarking platform.

* CIKM 2022 (short) 
Viaarxiv icon

Deep Learning Based Automatic Modulation Recognition: Models, Datasets, and Challenges

Jul 20, 2022
Fuxin Zhang, Chunbo Luo, Jialang Xu, Yang Luo, FuChun Zheng

Automatic modulation recognition (AMR) detects the modulation scheme of the received signals for further signal processing without needing prior information, and provides the essential function when such information is missing. Recent breakthroughs in deep learning (DL) have laid the foundation for developing high-performance DL-AMR approaches for communications systems. Comparing with traditional modulation detection methods, DL-AMR approaches have achieved promising performance including high recognition accuracy and low false alarms due to the strong feature extraction and classification abilities of deep neural networks. Despite the promising potential, DL-AMR approaches also bring concerns to complexity and explainability, which affect the practical deployment in wireless communications systems. This paper aims to present a review of the current DL-AMR research, with a focus on appropriate DL models and benchmark datasets. We further provide comprehensive experiments to compare the state of the art models for single-input-single-output (SISO) systems from both accuracy and complexity perspectives, and propose to apply DL-AMR in the new multiple-input-multiple-output (MIMO) scenario with precoding. Finally, existing challenges and possible future research directions are discussed.

Viaarxiv icon

Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification

Mar 31, 2022
Yang Luo, Zhineng Chen, Xieping Gao

Figure 1 for Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification
Figure 2 for Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification
Figure 3 for Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification
Figure 4 for Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification

Self-supervised learning (SSL) has drawn increasing attention in pathological image analysis in recent years. However, the prevalent contrastive SSL is suboptimal in feature representation under this scenario due to the homogeneous visual appearance. Alternatively, masked autoencoders (MAE) build SSL from a generative paradigm. They are more friendly to pathological image modeling. In this paper, we firstly introduce MAE to pathological image analysis. A novel SD-MAE model is proposed to enable a self-distillation augmented SSL on top of the raw MAE. Besides the reconstruction loss on masked image patches, SD-MAE further imposes the self-distillation loss on visible patches. It guides the encoder to perceive high-level semantics that benefit downstream tasks. We apply SD-MAE to the image classification task on two pathological and one natural image datasets. Experiments demonstrate that SD-MAE performs highly competitive when compared with leading contrastive SSL methods. The results, which are pre-trained using a moderate size of pathological images, are also comparable to the method pre-trained with two orders of magnitude more images. Our code will be released soon.

Viaarxiv icon