Alert button
Picture for Jinghui Qin

Jinghui Qin

Alert button

ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion

Oct 11, 2023
Jinghui Qin, Lihuang Fang, Ruitao Lu, Liang Lin, Yukai Shi

Figure 1 for ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion
Figure 2 for ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion
Figure 3 for ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion
Figure 4 for ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion

Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimizes and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion. Our framework is sample-aware and optimizes an augmentor network and two downsampling networks jointly by adversarial learning so that we can learn more robust downsampling networks for training the upsampling network. Extensive experiments on two public classical hyperspectral datasets demonstrate the effectiveness of our ADASR compared to the state-of-the-art methods.

* This paper has been accepted by IEEE Geoscience and Remote Sensing Letters. Code is released at https://github.com/fangfang11-plog/ADASR 
Viaarxiv icon

Understanding Self-attention Mechanism via Dynamical System Perspective

Aug 19, 2023
Zhongzhan Huang, Mingfu Liang, Jinghui Qin, Shanshan Zhong, Liang Lin

The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models. However, current explanations of this mechanism are mainly based on intuitions and experiences, while there still lacks direct modeling for how the SAM helps performance. To mitigate this issue, in this paper, based on the dynamical system perspective of the residual neural network, we first show that the intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN). Thus the ability of NN to measure SP at the feature level is necessary to obtain high performance and is an important factor in the difficulty of training NN. Similar to the adaptive step-size method which is effective in solving stiff ODEs, we show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP by refining the estimation of stiffness information and generating adaptive attention values, which provides a new understanding about why and how the SAM can benefit the model performance. This novel perspective can also explain the lottery ticket hypothesis in SAM, design new quantitative metrics of representational ability, and inspire a new theoretic-inspired approach, StepNet. Extensive experiments on several popular benchmarks demonstrate that StepNet can extract fine-grained stiffness information and measure SP accurately, leading to significant improvements in various visual tasks.

* Accepted by ICCV 2023 
Viaarxiv icon

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

May 12, 2023
Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin

Figure 1 for SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Figure 2 for SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Figure 3 for SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Figure 4 for SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense reasoning in existing models when the input prompts are concise narrative, resulting in low-quality image generation. To improve the capacities for narrative prompts, we propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. To reach this goal, we first collect and annotate a new dataset SURD which consists of more than 57,000 semantically corrected multi-modal samples. Each sample contains a simple narrative prompt, a complex keyword-based prompt, and a high-quality image. Then, we align the semantic representation of narrative prompts to the complex prompts and transfer knowledge of large language models (LLMs) to our SUR-adapter via knowledge distillation so that it can acquire the powerful semantic understanding and reasoning capabilities to build a high-quality textual semantic representation for text-to-image generation. We conduct experiments by integrating multiple LLMs and popular pre-trained diffusion models to show the effectiveness of our approach in enabling diffusion models to understand and reason concise natural language without image quality degradation. Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts. The code is released at https://github.com/Qrange-group/SUR-adapter.

* work in progress 
Viaarxiv icon

LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem

May 09, 2023
Shanshan Zhong, Wushao Wen, Jinghui Qin, Qiangpu Chen, Zhongzhan Huang

Figure 1 for LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem
Figure 2 for LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem
Figure 3 for LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem
Figure 4 for LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem

In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i.e., the ability to recognize and focus on key pixel regions in an image. However, in this paper, we quantitatively and statistically illustrate that DNNs have a serious attention bias problem on many samples from some popular datasets: (1) Position bias: DNNs fully focus on label-independent regions; (2) Range bias: The focused regions from DNN are not completely contained in the ideal region. Moreover, we find that the existing self-attention modules can alleviate these biases to a certain extent, but the biases are still non-negligible. To further mitigate them, we propose a lightweight sub-attention strategy (LSAS), which utilizes high-order sub-attention modules to improve the original self-attention modules. The effectiveness of LSAS is demonstrated by extensive experiments on widely-used benchmark datasets and popular attention networks. We release our code to help other researchers to reproduce the results of LSAS~\footnote{https://github.com/Qrange-group/LSAS}.

Viaarxiv icon

ASR: Attention-alike Structural Re-parameterization

Apr 13, 2023
Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin

Figure 1 for ASR: Attention-alike Structural Re-parameterization
Figure 2 for ASR: Attention-alike Structural Re-parameterization
Figure 3 for ASR: Attention-alike Structural Re-parameterization
Figure 4 for ASR: Attention-alike Structural Re-parameterization

The structural re-parameterization (SRP) technique is a novel deep learning technique that achieves interconversion between different network architectures through equivalent parameter transformations. This technique enables the mitigation of the extra costs for performance improvement during training, such as parameter size and inference time, through these transformations during inference, and therefore SRP has great potential for industrial and practical applications. The existing SRP methods have successfully considered many commonly used architectures, such as normalizations, pooling methods, multi-branch convolution. However, the widely used self-attention modules cannot be directly implemented by SRP due to these modules usually act on the backbone network in a multiplicative manner and the modules' output is input-dependent during inference, which limits the application scenarios of SRP. In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training. This observation inspires us to propose a simple-yet-effective attention-alike structural re-parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the self-attention mechanism. Extensive experiments conducted on several standard benchmarks demonstrate the effectiveness of ASR in generally improving the performance of existing backbone networks, self-attention modules, and SRP methods without any elaborated model crafting. We also analyze the limitations and provide experimental or theoretical evidence for the strong robustness of the proposed ASR.

* Technical report 
Viaarxiv icon

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

Dec 06, 2022
Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang

Figure 1 for UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
Figure 2 for UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
Figure 3 for UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
Figure 4 for UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

Geometry problem solving is a well-recognized testbed for evaluating the high-level multi-modal reasoning capability of deep models. In most existing works, two main geometry problems: calculation and proving, are usually treated as two specific tasks, hindering a deep model to unify its reasoning capability on multiple math tasks. However, in essence, these two tasks have similar problem representations and overlapped math knowledge which can improve the understanding and reasoning ability of a deep model on both two tasks. Therefore, we construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. Each proving problem is annotated with a multi-step proof with reasons and mathematical expressions. The proof can be easily reformulated as a proving sequence that shares the same formats with the annotated program sequence for calculation problems. Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation. Furthermore, we propose a Mathematical Expression Pretraining (MEP) method that aims to predict the mathematical expressions in the problem solution, thus improving the Geoformer model. Experiments on the UniGeo demonstrate that our proposed Geoformer obtains state-of-the-art performance by outperforming task-specific model NGS with over 5.6% and 3.2% accuracies on calculation and proving problems, respectively.

Viaarxiv icon

Deepening Neural Networks Implicitly and Locally via Recurrent Attention Strategy

Oct 27, 2022
Shanshan Zhong, Wushao Wen, Jinghui Qin, Zhongzhan Huang

Figure 1 for Deepening Neural Networks Implicitly and Locally via Recurrent Attention Strategy
Figure 2 for Deepening Neural Networks Implicitly and Locally via Recurrent Attention Strategy
Figure 3 for Deepening Neural Networks Implicitly and Locally via Recurrent Attention Strategy
Figure 4 for Deepening Neural Networks Implicitly and Locally via Recurrent Attention Strategy

More and more empirical and theoretical evidence shows that deepening neural networks can effectively improve their performance under suitable training settings. However, deepening the backbone of neural networks will inevitably and significantly increase computation and parameter size. To mitigate these problems, we propose a simple-yet-effective Recurrent Attention Strategy (RAS), which implicitly increases the depth of neural networks with lightweight attention modules by local parameter sharing. The extensive experiments on three widely-used benchmark datasets demonstrate that RAS can improve the performance of neural networks at a slight addition of parameter size and computation, performing favorably against other existing well-known attention modules.

* Work in progress 
Viaarxiv icon

Causal Inference for Chatting Handoff

Oct 06, 2022
Shanshan Zhong, Jinghui Qin, Zhongzhan Huang, Daifeng Li

Figure 1 for Causal Inference for Chatting Handoff
Figure 2 for Causal Inference for Chatting Handoff
Figure 3 for Causal Inference for Chatting Handoff
Figure 4 for Causal Inference for Chatting Handoff

Aiming to ensure chatbot quality by predicting chatbot failure and enabling human-agent collaboration, Machine-Human Chatting Handoff (MHCH) has attracted lots of attention from both industry and academia in recent years. However, most existing methods mainly focus on the dialogue context or assist with global satisfaction prediction based on multi-task learning, which ignore the grounded relationships among the causal variables, like the user state and labor cost. These variables are significantly associated with handoff decisions, resulting in prediction bias and cost increasement. Therefore, we propose Causal-Enhance Module (CEM) by establishing the causal graph of MHCH based on these two variables, which is a simple yet effective module and can be easy to plug into the existing MHCH methods. For the impact of users, we use the user state to correct the prediction bias according to the causal relationship of multi-task. For the labor cost, we train an auxiliary cost simulator to calculate unbiased labor cost through counterfactual learning so that a model becomes cost-aware. Extensive experiments conducted on four real-world benchmarks demonstrate the effectiveness of CEM in generally improving the performance of existing MHCH methods without any elaborated model crafting.

* Work in progress 
Viaarxiv icon

Switchable Self-attention Module

Sep 13, 2022
Shanshan Zhong, Wushao Wen, Jinghui Qin

Figure 1 for Switchable Self-attention Module
Figure 2 for Switchable Self-attention Module
Figure 3 for Switchable Self-attention Module
Figure 4 for Switchable Self-attention Module

Attention mechanism has gained great success in vision recognition. Many works are devoted to improving the effectiveness of attention mechanism, which finely design the structure of the attention operator. These works need lots of experiments to pick out the optimal settings when scenarios change, which consumes a lot of time and computational resources. In addition, a neural network often contains many network layers, and most studies often use the same attention module to enhance different network layers, which hinders the further improvement of the performance of the self-attention mechanism. To address the above problems, we propose a self-attention module SEM. Based on the input information of the attention module and alternative attention operators, SEM can automatically decide to select and integrate attention operators to compute attention maps. The effectiveness of SEM is demonstrated by extensive experiments on widely used benchmark datasets and popular self-attention networks.

* Work in progress 
Viaarxiv icon

Mix-Pooling Strategy for Attention Mechanism

Aug 22, 2022
Shanshan Zhong, Wushao Wen, Jinghui Qin

Figure 1 for Mix-Pooling Strategy for Attention Mechanism
Figure 2 for Mix-Pooling Strategy for Attention Mechanism
Figure 3 for Mix-Pooling Strategy for Attention Mechanism
Figure 4 for Mix-Pooling Strategy for Attention Mechanism

Recently many effective self-attention modules are proposed to boot the model performance by exploiting the internal information of convolutional neural networks in computer vision. In general, many previous works ignore considering the design of the pooling strategy of the self-attention mechanism since they adopt the global average pooling for granted, which hinders the further improvement of the performance of the self-attention mechanism. However, we empirically find and verify a phenomenon that the simple linear combination of global max-pooling and global min-pooling can produce pooling strategies that match or exceed the performance of global average pooling. Based on this empirical observation, we propose a simple-yet-effective self-attention module SPENet, which adopts a self-adaptive pooling strategy based on global max-pooling and global min-pooling and a lightweight module for producing the attention map. The effectiveness of SPENet is demonstrated by extensive experiments on widely used benchmark datasets and popular self-attention networks.

* Work in progress 
Viaarxiv icon