Alert button
Picture for Maojun Zhang

Maojun Zhang

Alert button

A Unitary Weights Based One-Iteration Quantum Perceptron Algorithm for Non-Ideal Training Sets

Sep 23, 2023
Wenjie Liu, Peipei Gao, Yuxiang Wang, Wenbin Yu, Maojun Zhang

In order to solve the problem of non-ideal training sets (i.e., the less-complete or over-complete sets) and implement one-iteration learning, a novel efficient quantum perceptron algorithm based on unitary weights is proposed, where the singular value decomposition of the total weight matrix from the training set is calculated to make the weight matrix to be unitary. The example validation of quantum gates {H, S, T, CNOT, Toffoli, Fredkin} shows that our algorithm can accurately implement arbitrary quantum gates within one iteration. The performance comparison between our algorithm and other quantum perceptron algorithms demonstrates the advantages of our algorithm in terms of applicability, accuracy, and availability. For further validating the applicability of our algorithm, a quantum composite gate which consists of several basic quantum gates is also illustrated.

* IEEE Access, 2019. 7: p. 36854-36865  
* 12 pages, 5 figures 
Viaarxiv icon

Long-term Visual Localization with Mobile Sensors

Apr 16, 2023
Shen Yan, Yu Liu, Long Wang, Zehong Shen, Zhen Peng, Haomin Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou

Figure 1 for Long-term Visual Localization with Mobile Sensors
Figure 2 for Long-term Visual Localization with Mobile Sensors
Figure 3 for Long-term Visual Localization with Mobile Sensors
Figure 4 for Long-term Visual Localization with Mobile Sensors

Despite the remarkable advances in image matching and pose estimation, image-based localization of a camera in a temporally-varying outdoor environment is still a challenging problem due to huge appearance disparity between query and reference images caused by illumination, seasonal and structural changes. In this work, we propose to leverage additional sensors on a mobile phone, mainly GPS, compass, and gravity sensor, to solve this challenging problem. We show that these mobile sensors provide decent initial poses and effective constraints to reduce the searching space in image matching and final pose estimation. With the initial pose, we are also able to devise a direct 2D-3D matching network to efficiently establish 2D-3D correspondences instead of tedious 2D-2D matching in existing systems. As no public dataset exists for the studied problem, we collect a new dataset that provides a variety of mobile sensor data and significant scene appearance variations, and develop a system to acquire ground-truth poses for query images. We benchmark our method as well as several state-of-the-art baselines and demonstrate the effectiveness of the proposed approach. The code and dataset will be released publicly.

Viaarxiv icon

Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior

Feb 13, 2023
Shen Yan, Xiaoya Cheng, Yuxiang Liu, Juelin Zhu, Rouwan Wu, Yu Liu, Maojun Zhang

Figure 1 for Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior
Figure 2 for Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior
Figure 3 for Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior
Figure 4 for Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior

Despite the significant progress in 6-DoF visual localization, researchers are mostly driven by ground-level benchmarks. Compared with aerial oblique photography, ground-level map collection lacks scalability and complete coverage. In this work, we propose to go beyond the traditional ground-level setting and exploit the cross-view localization from aerial to ground. We solve this problem by formulating camera pose estimation as an iterative render-and-compare pipeline and enhancing the robustness through augmenting seeds from noisy initial priors. As no public dataset exists for the studied problem, we collect a new dataset that provides a variety of cross-view images from smartphones and drones and develop a semi-automatic system to acquire ground-truth poses for query images. We benchmark our method as well as several state-of-the-art baselines and demonstrate that our method outperforms other approaches by a large margin.

Viaarxiv icon

Wireless Image Transmission with Semantic and Security Awareness

Dec 01, 2022
Maojun Zhang, Yang Li, Zezhong Zhang, Guangxu Zhu, Caijun Zhong

Figure 1 for Wireless Image Transmission with Semantic and Security Awareness
Figure 2 for Wireless Image Transmission with Semantic and Security Awareness
Figure 3 for Wireless Image Transmission with Semantic and Security Awareness
Figure 4 for Wireless Image Transmission with Semantic and Security Awareness

Semantic communication is an increasingly popular framework for wireless image transmission due to its high communication efficiency. With the aid of the joint-source-and-channel (JSC) encoder implemented by neural network, semantic communication directly maps original images into symbol sequences containing semantic information. Compared with the traditional separate source and channel coding design used in bitlevel communication systems, semantic communication systems are known to be more efficient and accurate especially in the low signal-to-the-noise ratio (SNR) regime. This thus prompts an critical while yet to be tackled issue of security in semantic communication: it makes the eavesdropper more easier to crack the semantic information as it can be decoded even in a quite noisy channel. In this letter, we develop a semantic communication framework that accounts for both semantic meaning decoding efficiency and its risk of privacy leakage. To achieve this, targeting wireless image transmission, we on the one hand propose an JSC autoencoder featuring residual for efficient semantic meaning extraction and transmission, and on the other hand, propose a data-driven scheme that balances the efficiency-privacy tradeoff. Extensive experimental results are provided to show the effectiveness and robustness of the proposed scheme.

* Submitted to IEEE WCL for possible publication 
Viaarxiv icon

A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design

Jul 08, 2022
Maojun Zhang, Jiabao Gao, Caijun Zhong

Figure 1 for A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design
Figure 2 for A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design
Figure 3 for A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design
Figure 4 for A Deep Learning-Based Framework for Low Complexity Multi-User MIMO Precoding Design

Using precoding to suppress multi-user interference is a well-known technique to improve spectra efficiency in multiuser multiple-input multiple-output (MU-MIMO) systems, and the pursuit of high performance and low complexity precoding method has been the focus in the last decade. The traditional algorithms including the zero-forcing (ZF) algorithm and the weighted minimum mean square error (WMMSE) algorithm failed to achieve a satisfactory trade-off between complexity and performance. In this paper, leveraging on the power of deep learning, we propose a low-complexity precoding design framework for MU-MIMO systems. The key idea is to transform the MIMO precoding problem into the multiple-input single-output precoding problem, where the optimal precoding structure can be obtained in closed-form. A customized deep neural network is designed to fit the mapping from the channels to the precoding matrix. In addition, the technique of input dimensionality reduction, network pruning, and recovery module compression are used to further improve the computational efficiency. Furthermore, the extension to the practical MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) system is studied. Simulation results show that the proposed low-complexity precoding scheme achieves similar performance as the WMMSE algorithm with very low computational complexity.

Viaarxiv icon

Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling

Jul 24, 2021
Maojun Zhang, Guangxu Zhu, Shuai Wang, Jiamo Jiang, Caijun Zhong, Shuguang Cui

Figure 1 for Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling
Figure 2 for Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling
Figure 3 for Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling

The popular federated edge learning (FEEL) framework allows privacy-preserving collaborative model training via frequent learning-updates exchange between edge devices and server. Due to the constrained bandwidth, only a subset of devices can upload their updates at each communication round. This has led to an active research area in FEEL studying the optimal device scheduling policy for minimizing communication time. However, owing to the difficulty in quantifying the exact communication time, prior work in this area can only tackle the problem partially by considering either the communication rounds or per-round latency, while the total communication time is determined by both metrics. To close this gap, we make the first attempt in this paper to formulate and solve the communication time minimization problem. We first derive a tight bound to approximate the communication time through cross-disciplinary effort involving both learning theory for convergence analysis and communication theory for per-round latency analysis. Building on the analytical result, an optimized probabilistic scheduling policy is derived in closed-form by solving the approximate communication time minimization problem. It is found that the optimized policy gradually turns its priority from suppressing the remaining communication rounds to reducing per-round latency as the training process evolves. The effectiveness of the proposed scheme is demonstrated via a use case on collaborative 3D objective detection in autonomous driving.

* In Proc. IEEE SPAWC2021 
Viaarxiv icon

Image Retrieval for Structure-from-Motion via Graph Convolutional Network

Sep 17, 2020
Shen Yan, Yang Pen, Shiming Lai, Yu Liu, Maojun Zhang

Figure 1 for Image Retrieval for Structure-from-Motion via Graph Convolutional Network
Figure 2 for Image Retrieval for Structure-from-Motion via Graph Convolutional Network
Figure 3 for Image Retrieval for Structure-from-Motion via Graph Convolutional Network
Figure 4 for Image Retrieval for Structure-from-Motion via Graph Convolutional Network

Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall. In this paper, we present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy. We formulate image retrieval task as a node binary classification problem in graph data: a node is marked as positive if it shares the scene overlaps with the query image. The key idea is that we find that the local context in feature space around a query image contains rich information about the matchable relation between this image and its neighbors. By constructing a subgraph surrounding the query image as input data, we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph. Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes. Besides, compared with state-of-the-art matchable retrieval methods, the proposed approach significantly reduces useless attempted matches without sacrificing the accuracy and completeness of reconstruction.

Viaarxiv icon

DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

Jul 13, 2020
Shunjie Dong, Jinlong Zhao, Maojun Zhang, Zhengxue Shi, Jianing Deng, Yiyu Shi, Mei Tian, Cheng Zhuo

Figure 1 for DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation
Figure 2 for DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation
Figure 3 for DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation
Figure 4 for DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution and ambiguous border (e.g., right ventricular endocardium), existing methods suffer from the degradation of accuracy and robustness in 3D cardiac MRI video segmentation. In this paper, we propose a novel Deformable U-Net (DeU-Net) to fully exploit spatio-temporal information from 3D cardiac MRI video, including a Temporal Deformable Aggregation Module (TDAM) and a Deformable Global Position Attention (DGPA) network. First, the TDAM takes a cardiac MRI video clip as input with temporal information extracted by an offset prediction network. Then we fuse extracted temporal information via a temporal aggregation deformable convolution to produce fused feature maps. Furthermore, to aggregate meaningful features, we devise the DGPA network by employing deformable attention U-Net, which can encode a wider range of multi-dimensional contextual information into global and local features. Experimental results show that our DeU-Net achieves the state-of-the-art performance on commonly used evaluation metrics, especially for cardiac marginal information (ASSD and HD).

Viaarxiv icon

Transferable Semi-supervised Semantic Segmentation

May 09, 2018
Huaxin Xiao, Yunchao Wei, Yu Liu, Maojun Zhang, Jiashi Feng

Figure 1 for Transferable Semi-supervised Semantic Segmentation
Figure 2 for Transferable Semi-supervised Semantic Segmentation
Figure 3 for Transferable Semi-supervised Semantic Segmentation
Figure 4 for Transferable Semi-supervised Semantic Segmentation

The performance of deep learning based semantic segmentation models heavily depends on sufficient data with careful annotations. However, even the largest public datasets only provide samples with pixel-level annotations for rather limited semantic categories. Such data scarcity critically limits scalability and applicability of semantic segmentation models in real applications. In this paper, we propose a novel transferable semi-supervised semantic segmentation model that can transfer the learned segmentation knowledge from a few strong categories with pixel-level annotations to unseen weak categories with only image-level annotations, significantly broadening the applicable territory of deep segmentation models. In particular, the proposed model consists of two complementary and learnable components: a Label transfer Network (L-Net) and a Prediction transfer Network (P-Net). The L-Net learns to transfer the segmentation knowledge from strong categories to the images in the weak categories and produces coarse pixel-level semantic maps, by effectively exploiting the similar appearance shared across categories. Meanwhile, the P-Net tailors the transferred knowledge through a carefully designed adversarial learning strategy and produces refined segmentation results with better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4% performance of the fully-supervised baseline using 50% and 0% categories with pixel-level annotations respectively on PASCAL VOC 2012. With such a novel transfer mechanism, our proposed model is easily generalizable to a variety of new categories, only requiring image-level annotations, and offers appealing scalability in real applications.

* Minor update of arXiv:1711.06828 
Viaarxiv icon

Deep Motion Boundary Detection

Apr 13, 2018
Xiaoqing Yin, Xiyang Dai, Xinchao Wang, Maojun Zhang, Dacheng Tao, Larry Davis

Figure 1 for Deep Motion Boundary Detection
Figure 2 for Deep Motion Boundary Detection
Figure 3 for Deep Motion Boundary Detection
Figure 4 for Deep Motion Boundary Detection

Motion boundary detection is a crucial yet challenging problem. Prior methods focus on analyzing the gradients and distributions of optical flow fields, or use hand-crafted features for motion boundary learning. In this paper, we propose the first dedicated end-to-end deep learning approach for motion boundary detection, which we term as MoBoNet. We introduce a refinement network structure which takes source input images, initial forward and backward optical flows as well as corresponding warping errors as inputs and produces high-resolution motion boundaries. Furthermore, we show that the obtained motion boundaries, through a fusion sub-network we design, can in turn guide the optical flows for removing the artifacts. The proposed MoBoNet is generic and works with any optical flows. Our motion boundary detection and the refined optical flow estimation achieve results superior to the state of the art.

* 17 pages, 5 figures 
Viaarxiv icon