Alert button
Picture for Wei Luo

Wei Luo

Alert button

Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?

May 20, 2023
Bing Liu, Wei Luo, Gang Li, Jing Huang, Bo Yang

Figure 1 for Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?
Figure 2 for Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?
Figure 3 for Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?
Figure 4 for Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?

As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.

* Accepted by IJCAI 2023 
Viaarxiv icon

Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow

Mar 31, 2023
Haiming Yao, Wei Luo, Wenyong Yu

Figure 1 for Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow
Figure 2 for Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow
Figure 3 for Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow
Figure 4 for Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow

In this paper, we introduce the novel state-of-the-art Dual-attention Transformer and Discriminative Flow (DADF) framework for visual anomaly detection. Based on only normal knowledge, visual anomaly detection has wide applications in industrial scenarios and has attracted significant attention. However, most existing methods fail to meet the requirements. In contrast, the proposed DTDF presents a new paradigm: it firstly leverages a pre-trained network to acquire multi-scale prior embeddings, followed by the development of a vision Transformer with dual attention mechanisms, namely self-attention and memorial-attention, to achieve two-level reconstruction for prior embeddings with the sequential and normality association. Additionally, we propose using normalizing flow to establish discriminative likelihood for the joint distribution of prior and reconstructions at each scale. The DADF achieves 98.3/98.4 of image/pixel AUROC on Mvtec AD; 83.7 of image AUROC and 67.4 of pixel sPRO on Mvtec LOCO AD benchmarks, demonstrating the effectiveness of our proposed approach.

* Submission to IEEE Transactions On Industrial Informatics 
Viaarxiv icon

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

Mar 28, 2023
Deze Wang, Boxing Chen, Shanshan Li, Wei Luo, Shaoliang Peng, Wei Dong, Xiangke Liao

Figure 1 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Figure 2 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Figure 3 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Figure 4 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6\% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and low-resource scenarios. Multilingual fine-tuning with 200 samples per programming language approaches the results fine-tuned with the entire dataset on code summarization. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.

* Accepted to the 45th International Conference on Software Engineering (ICSE 2023) 
Viaarxiv icon

Learning Global-Local Correspondence with Semantic Bottleneck for Logical Anomaly Detection

Mar 10, 2023
Haiming Yao, Wenyong Yu, Wei Luo, Zhenfeng Qiang, Donghao Luo, Xiaotian Zhang

Figure 1 for Learning Global-Local Correspondence with Semantic Bottleneck for Logical Anomaly Detection
Figure 2 for Learning Global-Local Correspondence with Semantic Bottleneck for Logical Anomaly Detection
Figure 3 for Learning Global-Local Correspondence with Semantic Bottleneck for Logical Anomaly Detection
Figure 4 for Learning Global-Local Correspondence with Semantic Bottleneck for Logical Anomaly Detection

This paper presents a novel framework, named Global-Local Correspondence Framework (GLCF), for visual anomaly detection with logical constraints. Visual anomaly detection has become an active research area in various real-world applications, such as industrial anomaly detection and medical disease diagnosis. However, most existing methods focus on identifying local structural degeneration anomalies and often fail to detect high-level functional anomalies that involve logical constraints. To address this issue, we propose a two-branch approach that consists of a local branch for detecting structural anomalies and a global branch for detecting logical anomalies. To facilitate local-global feature correspondence, we introduce a novel semantic bottleneck enabled by the visual Transformer. Moreover, we develop feature estimation networks for each branch separately to detect anomalies. Our proposed framework is validated using various benchmarks, including industrial datasets, Mvtec AD, Mvtec Loco AD, and the Retinal-OCT medical dataset. Experimental results show that our method outperforms existing methods, particularly in detecting logical anomalies.

* Submission to IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 
Viaarxiv icon

CyberLoc: Towards Accurate Long-term Visual Localization

Jan 06, 2023
Liu Liu, Yukai Lin, Xiao Liang, Qichao Xu, Miao Jia, Yangdong Liu, Yuxiang Wen, Wei Luo, Jiangwei Li

Figure 1 for CyberLoc: Towards Accurate Long-term Visual Localization
Figure 2 for CyberLoc: Towards Accurate Long-term Visual Localization
Figure 3 for CyberLoc: Towards Accurate Long-term Visual Localization
Figure 4 for CyberLoc: Towards Accurate Long-term Visual Localization

This technical report introduces CyberLoc, an image-based visual localization pipeline for robust and accurate long-term pose estimation under challenging conditions. The proposed method comprises four modules connected in a sequence. First, a mapping module is applied to build accurate 3D maps of the scene, one map for each reference sequence if there exist multiple reference sequences under different conditions. Second, a single-image-based localization pipeline (retrieval--matching--PnP) is performed to estimate 6-DoF camera poses for each query image, one for each 3D map. Third, a consensus set maximization module is proposed to filter out outlier 6-DoF camera poses, and outputs one 6-DoF camera pose for a query. Finally, a robust pose refinement module is proposed to optimize 6-DoF query poses, taking candidate global 6-DoF camera poses and their corresponding global 2D-3D matches, sparse 2D-2D feature matches between consecutive query images and SLAM poses of the query sequence as input. Experiments on the 4seasons dataset show that our method achieves high accuracy and robustness. In particular, our approach wins the localization challenge of ECCV 2022 workshop on Map-based Localization for Autonomous Driving (MLAD-ECCV2022).

* MLAD-ECCV 2022 
Viaarxiv icon

Reference-Based Autoencoder for Surface Defect Detection

Nov 18, 2022
Wei Luo, Haiming Yao, Wenyong Yu, Xue Wang

Figure 1 for Reference-Based Autoencoder for Surface Defect Detection
Figure 2 for Reference-Based Autoencoder for Surface Defect Detection
Figure 3 for Reference-Based Autoencoder for Surface Defect Detection
Figure 4 for Reference-Based Autoencoder for Surface Defect Detection

Due to the extreme imbalance in the number of normal data and abnormal data, visual anomaly detection is important for the development of industrial automatic product quality inspection. Unsupervised methods based on reconstruction and embedding have been widely studied for anomaly detection, of which reconstruction-based methods are the most popular. However, establishing a unified model for textured surface defect detection remains a challenge because these surfaces can vary in homogeneous and non regularly ways. Furthermore, existing reconstruction-based methods do not have a strong ability to convert the defect feature to the normal feature. To address these challenges, we propose a novel unsupervised reference-based autoencoder (RB-AE) to accurately inspect a variety of textured defects. Unlike most reconstruction-based methods, artificial defects and a novel pixel-level discrimination loss function are utilized for training to enable the model to obtain pixel-level discrimination ability. First, the RB-AE employs an encoding module to extract multi-scale features of the textured surface. Subsequently, a novel reference-based attention module (RBAM) is proposed to convert the defect features to normal features to suppress the reconstruction of defects. In addition, RBAM can also effectively suppress the defective feature residual caused by skip-connection. Next, a decoding module utilizes the repaired features to reconstruct the normal texture background. Finally, a novel multiscale feature discrimination module (MSFDM) is employed to defect detection and segmentation.

* 13pages 
Viaarxiv icon

Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

Oct 18, 2022
Chen Wang, Yuchen Liu, Boxing Chen, Jiajun Zhang, Wei Luo, Zhongqiang Huang, Chengqing Zong

Figure 1 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Figure 2 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Figure 3 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Figure 4 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions. However, the training of end-to-end methods relies on parallel ST data, which are difficult and expensive to obtain. Fortunately, the supervised data for automatic speech recognition (ASR) and machine translation (MT) are usually more accessible, making zero-shot speech translation a potential direction. Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space, resulting in much worse performance compared to the supervised ST methods. In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. Specifically, we introduce a vector quantization module to discretize the continuous representations of speech and text into a finite set of virtual tokens, and use ASR data to map corresponding speech and text to the same virtual token in a shared codebook. This way, source language speech can be embedded in the same semantic space as the source language text, which can be then transformed into target language text with an MT module. Experiments on multiple language pairs demonstrate that our zero-shot ST method significantly improves the SOTA, and even performers on par with the strong supervised ST baselines.

* Accepted by the main conference of EMNLP 2022 
Viaarxiv icon

Time-Optimal Handover Trajectory Planning for Aerial Manipulators based on Discrete Mechanics and Complementarity Constraints

Sep 01, 2022
Wei Luo, Jingshan Chen, Henrik Ebel, Peter Eberhard

Figure 1 for Time-Optimal Handover Trajectory Planning for Aerial Manipulators based on Discrete Mechanics and Complementarity Constraints
Figure 2 for Time-Optimal Handover Trajectory Planning for Aerial Manipulators based on Discrete Mechanics and Complementarity Constraints
Figure 3 for Time-Optimal Handover Trajectory Planning for Aerial Manipulators based on Discrete Mechanics and Complementarity Constraints
Figure 4 for Time-Optimal Handover Trajectory Planning for Aerial Manipulators based on Discrete Mechanics and Complementarity Constraints

Planning a time-optimal trajectory for aerial robots is critical in many drone applications, such as rescue missions and package delivery, which have been widely researched in recent years. However, it still involves several challenges, particularly when it comes to incorporating special task requirements into the planning as well as the aerial robot's dynamics. In this work, we study a case where an aerial manipulator shall hand over a parcel from a moving mobile robot in a time-optimal manner. Rather than setting up the approach trajectory manually, which makes it difficult to determine the optimal total travel time to accomplish the desired task within dynamic limits, we propose an optimization framework, which combines discrete mechanics and complementarity constraints (DMCC) together. In the proposed framework, the system dynamics is constrained with the discrete variational Lagrangian mechanics that provides reliable estimation results also according to our experiments. The handover opportunities are automatically determined and arranged based on the desired complementarity constraints. Finally, the performance of the proposed framework is verified with numerical simulations and hardware experiments with our self-designed aerial manipulators.

Viaarxiv icon

Clear Memory-Augmented Auto-Encoder for Surface Defect Detection

Aug 08, 2022
Wei Luo, Tongzhi Niu, Lixin Tang, Wenyong Yu, Bin Li

Figure 1 for Clear Memory-Augmented Auto-Encoder for Surface Defect Detection
Figure 2 for Clear Memory-Augmented Auto-Encoder for Surface Defect Detection
Figure 3 for Clear Memory-Augmented Auto-Encoder for Surface Defect Detection
Figure 4 for Clear Memory-Augmented Auto-Encoder for Surface Defect Detection

In surface defect detection, due to the extreme imbalance in the number of positive and negative samples, positive-samples-based anomaly detection methods have received more and more attention. Specifically, reconstruction-based methods are the most popular. However, exiting methods are either difficult to repair abnormal foregrounds or reconstruct clear backgrounds. Therefore, we propose a clear memory-augmented auto-encoder. At first, we propose a novel clear memory-augmented module, which combines the encoding and memory-encoding in a way of forgetting and inputting, thereby repairing abnormal foregrounds and preservation clear backgrounds. Secondly, a general artificial anomaly generation algorithm is proposed to simulate anomalies that are as realistic and feature-rich as possible. At last, we propose a novel multi scale feature residual detection method for defect segmentation, which makes the defect location more accurate. CMA-AE conducts comparative experiments using 11 state-of-the-art methods on five benchmark datasets, showing an average 18.6% average improvement in F1-measure.

* contribute to pattern recognition 
Viaarxiv icon

Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification

Mar 04, 2022
Qiaoling Chen, Zhihao Chen, Wei Luo

Figure 1 for Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification
Figure 2 for Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification
Figure 3 for Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification
Figure 4 for Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification

Effectively classifying remote sensing scenes is still a challenge due to the increasing spatial resolution of remote imaging and large variances between remote sensing images. Existing research has greatly improved the performance of remote sensing scene classification (RSSC). However, these methods are not applicable to cross-domain few-shot problems where target domain is with very limited training samples available and has a different data distribution from source domain. To improve the model's applicability, we propose the feature-wise transformation module (FTM) in this paper. FTM transfers the feature distribution learned on source domain to that of target domain by a very simple affine operation with negligible additional parameters. Moreover, FTM can be effectively learned on target domain in the case of few training data available and is agnostic to specific network structures. Experiments on RSSC and land-cover mapping tasks verified its capability to handle cross-domain few-shot problems. By comparison with directly finetuning, FTM achieves better performance and possesses better transferability and fine-grained discriminability. \textit{Code will be publicly available.}

* 6 pages, 5 figures 
Viaarxiv icon