Alert button
Picture for Wei Wang

Wei Wang

Alert button

Neural Image Compression Using Masked Sparse Visual Representation

Sep 20, 2023
Wei Jiang, Wei Wang, Yue Chen

We study neural image compression based on the Sparse Visual Representation (SVR), where images are embedded into a discrete latent space spanned by learned visual codebooks. By sharing codebooks with the decoder, the encoder transfers integer codeword indices that are efficient and cross-platform robust, and the decoder retrieves the embedded latent feature using the indices for reconstruction. Previous SVR-based compression lacks effective mechanism for rate-distortion tradeoffs, where one can only pursue either high reconstruction quality or low transmission bitrate. We propose a Masked Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent feature subspace to balance bitrate and reconstruction quality. A set of semantic-class-dependent basis codebooks are learned, which are weighted combined to generate a rich latent feature for high-quality reconstruction. The combining weights are adaptively derived from each input image, providing fidelity information with additional transmission costs. By masking out unimportant weights in the encoder and recovering them in the decoder, we can trade off reconstruction quality for transmission bits, and the masking rate controls the balance between bitrate and distortion. Experiments over the standard JPEG-AI dataset demonstrate the effectiveness of our M-AdaCode approach.

Viaarxiv icon

High-content stimulated Raman histology of human breast cancer

Sep 20, 2023
Hongli Ni, Chinmayee Prabhu Dessai, Haonan Lin, Wei Wang, Shaoxiong Chen, Yuhao Yuan, Xiaowei Ge, Jianpeng Ao, Nolan Vild, Ji-Xin Cheng

Histological examination is crucial for cancer diagnosis, including hematoxylin and eosin (H&E) staining for mapping morphology and immunohistochemistry (IHC) staining for revealing chemical information. Recently developed two-color stimulated Raman histology could bypass the complex tissue processing to mimic H&E-like morphology. Yet, the underlying chemical features are not revealed, compromising the effectiveness of prognostic stratification. Here, we present a high-content stimulated Raman histology (HC-SRH) platform that provides both morphological and chemical information for cancer diagnosis based on un-stained breast tissues. Through spectral unmixing in the C-H vibration window, HC-SRH can map unsaturated lipids, cellular protein, extracellular matrix, saturated lipid, and water in breast tissue. In this way, HC-SRH provides excellent contrast for various tissue components. Considering rapidness is important in clinical trials, we implemented spectral selective sampling to boost the speed of HC-SRH by one order. We also successfully demonstrated the HC-SRH in a clinical-compatible fiber laser-based SRS microscopy. With the widely rapid tuning capability of the advanced fiber laser, a clear chemical contrast of nucleic acid and solid-state ester is shown in the fingerprint result.

* 6 figures 
Viaarxiv icon

Robust Backdoor Attacks on Object Detection in Real World

Sep 16, 2023
Yaguan Qian, Boyuan Ji, Shuke He, Shenhui Huang, Xiang Ling, Bin Wang, Wei Wang

Deep learning models are widely deployed in many applications, such as object detection in various security fields. However, these models are vulnerable to backdoor attacks. Most backdoor attacks were intensively studied on classified models, but little on object detection. Previous works mainly focused on the backdoor attack in the digital world, but neglect the real world. Especially, the backdoor attack's effect in the real world will be easily influenced by physical factors like distance and illumination. In this paper, we proposed a variable-size backdoor trigger to adapt to the different sizes of attacked objects, overcoming the disturbance caused by the distance between the viewing point and attacked object. In addition, we proposed a backdoor training named malicious adversarial training, enabling the backdoor object detector to learn the feature of the trigger with physical noise. The experiment results show this robust backdoor attack (RBA) could enhance the attack success rate in the real world.

* 22 pages, 13figures 
Viaarxiv icon

Unveiling Invariances via Neural Network Pruning

Sep 15, 2023
Derek Xu, Yizhou Sun, Wei Wang

Invariance describes transformations that do not alter data's underlying semantics. Neural networks that preserve natural invariance capture good inductive biases and achieve superior performance. Hence, modern networks are handcrafted to handle well-known invariances (ex. translations). We propose a framework to learn novel network architectures that capture data-dependent invariances via pruning. Our learned architectures consistently outperform dense neural networks on both vision and tabular datasets in both efficiency and effectiveness. We demonstrate our framework on multiple deep learning models across 3 vision and 40 tabular datasets.

Viaarxiv icon

A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications

Sep 11, 2023
Wei Wang, Peizheng Li, Angela Doufexi, Mark A Beach

Figure 1 for A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications
Figure 2 for A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications
Figure 3 for A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications
Figure 4 for A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications

In reconfigurable intelligent surface (RIS)-assisted wireless communication systems, the pointing accuracy and intensity of reflections depend crucially on the 'profile,' representing the amplitude/phase state information of all elements in a RIS array. The superposition of multiple single-reflection profiles enables multi-reflection for distributed users. However, the optimization challenges from periodic element arrangements in single-reflection and multi-reflection profiles are understudied. The combination of periodical single-reflection profiles leads to amplitude/phase counteractions, affecting the performance of each reflection beam. This paper focuses on a dual-reflection optimization scenario and investigates the far-field performance deterioration caused by the misalignment of overlapped profiles. To address this issue, we introduce a novel deep reinforcement learning (DRL)-based optimization method. Comparative experiments against random and exhaustive searches demonstrate that our proposed DRL method outperforms both alternatives, achieving the shortest optimization time. Remarkably, our approach achieves a 1.2 dB gain in the reflection peak gain and a broader beam without any hardware modifications.

* 6 pages, 6 figures. This paper has been accepted for presentation at the VTC2023-Fall 
Viaarxiv icon

Mean Field Game-based Waveform Precoding Design for Mobile Crowd Integrated Sensing, Communication, and Computation Systems

Sep 06, 2023
Dezhi Wang, Chongwen Huang, Jiguang He, Xiaoming Chen, Wei Wang, Zhaoyang Zhang, Zhu Han, Mérouane Debbah

Figure 1 for Mean Field Game-based Waveform Precoding Design for Mobile Crowd Integrated Sensing, Communication, and Computation Systems
Figure 2 for Mean Field Game-based Waveform Precoding Design for Mobile Crowd Integrated Sensing, Communication, and Computation Systems
Figure 3 for Mean Field Game-based Waveform Precoding Design for Mobile Crowd Integrated Sensing, Communication, and Computation Systems
Figure 4 for Mean Field Game-based Waveform Precoding Design for Mobile Crowd Integrated Sensing, Communication, and Computation Systems

Data collection and processing timely is crucial for mobile crowd integrated sensing, communication, and computation~(ISCC) systems with various applications such as smart home and connected cars, which requires numerous integrated sensing and communication~(ISAC) devices to sense the targets and offload the data to the base station~(BS) for further processing. However, as the number of ISAC devices growing, there exists intensive interactions among ISAC devices in the processes of data collection and processing since they share the common network resources. In this paper, we consider the environment sensing problem in the large-scale mobile crowd ISCC systems and propose an efficient waveform precoding design algorithm based on the mean field game~(MFG). Specifically, to handle the complex interactions among large-scale ISAC devices, we first utilize the MFG method to transform the influence from other ISAC devices into the mean field term and derive the Fokker-Planck-Kolmogorov equation, which model the evolution of the system state. Then, we derive the cost function based on the mean field term and reformulate the waveform precoding design problem. Next, we utilize the G-prox primal-dual hybrid gradient algorithm to solve the reformulated problem and analyze the computational complexity of the proposed algorithm. Finally, simulation results demonstrate that the proposed algorithm can solve the interactions among large-scale ISAC devices effectively in the ISCC process. In addition, compared with other baselines, the proposed waveform precoding design algorithm has advantages in improving communication performance and reducing cost function.

* IEEE Transactions on Wireless Communications. 2023  
* 13 pages,9 figures 
Viaarxiv icon

MathAttack: Attacking Large Language Models Towards Math Solving Ability

Sep 04, 2023
Zihao Zhou, Qiufeng Wang, Mingyu Jin, Jie Yao, Jianan Ye, Wei Liu, Wei Wang, Xiaowei Huang, Kaizhu Huang

With the boom of Large Language Models (LLMs), the research of solving Math Word Problem (MWP) has recently made great progress. However, there are few studies to examine the security of LLMs in math solving ability. Instead of attacking prompts in the use of LLMs, we propose a MathAttack model to attack MWP samples which are closer to the essence of security in solving math problems. Compared to traditional text adversarial attack, it is essential to preserve the mathematical logic of original MWPs during the attacking. To this end, we propose logical entity recognition to identify logical entries which are then frozen. Subsequently, the remaining text are attacked by adopting a word-level attacker. Furthermore, we propose a new dataset RobustMath to evaluate the robustness of LLMs in math solving ability. Extensive experiments on our RobustMath and two another math benchmark datasets GSM8K and MultiAirth show that MathAttack could effectively attack the math solving ability of LLMs. In the experiments, we observe that (1) Our adversarial samples from higher-accuracy LLMs are also effective for attacking LLMs with lower accuracy (e.g., transfer from larger to smaller-size LLMs, or from few-shot to zero-shot prompts); (2) Complex MWPs (such as more solving steps, longer text, more numbers) are more vulnerable to attack; (3) We can improve the robustness of LLMs by using our adversarial samples in few-shot prompts. Finally, we hope our practice and observation can serve as an important attempt towards enhancing the robustness of LLMs in math solving ability. We will release our code and dataset.

* 11 pages, 6 figures 
Viaarxiv icon

Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration

Aug 29, 2023
Liyan Wang, Qinyu Yang, Cong Wang, Wei Wang, Jinshan Pan, Zhixun Su

Recent years have witnessed the remarkable performance of diffusion models in various vision tasks. However, for image restoration that aims to recover clear images with sharper details from given degraded observations, diffusion-based methods may fail to recover promising results due to inaccurate noise estimation. Moreover, simple constraining noises cannot effectively learn complex degradation information, which subsequently hinders the model capacity. To solve the above problems, we propose a coarse-to-fine diffusion Transformer (C2F-DFT) for image restoration. Specifically, our C2F-DFT contains diffusion self-attention (DFSA) and diffusion feed-forward network (DFN) within a new coarse-to-fine training scheme. The DFSA and DFN respectively capture the long-range diffusion dependencies and learn hierarchy diffusion representation to facilitate better restoration. In the coarse training stage, our C2F-DFT estimates noises and then generates the final clean image by a sampling algorithm. To further improve the restoration quality, we propose a simple yet effective fine training scheme. It first exploits the coarse-trained diffusion model with fixed steps to generate restoration results, which then would be constrained with corresponding ground-truth ones to optimize the models to remedy the unsatisfactory results affected by inaccurate noise estimation. Extensive experiments show that C2F-DFT significantly outperforms diffusion-based restoration method IR-SDE and achieves competitive performance compared with Transformer-based state-of-the-art methods on $3$ tasks, including deraining, deblurring, and real denoising. The code is available at

* 9 pages, 8 figures 
Viaarxiv icon

DALNet: A Rail Detection Network Based on Dynamic Anchor Line

Aug 24, 2023
Zichen Yu, Quanli Liu, Wei Wang, Liyong Zhang, Xiaoguang Zhao

Figure 1 for DALNet: A Rail Detection Network Based on Dynamic Anchor Line
Figure 2 for DALNet: A Rail Detection Network Based on Dynamic Anchor Line
Figure 3 for DALNet: A Rail Detection Network Based on Dynamic Anchor Line
Figure 4 for DALNet: A Rail Detection Network Based on Dynamic Anchor Line

Rail detection is one of the key factors for intelligent train. In the paper, motivated by the anchor line-based lane detection methods, we propose a rail detection network called DALNet based on dynamic anchor line. Aiming to solve the problem that the predefined anchor line is image agnostic, we design a novel dynamic anchor line mechanism. It utilizes a dynamic anchor line generator to dynamically generate an appropriate anchor line for each rail instance based on the position and shape of the rails in the input image. These dynamically generated anchor lines can be considered as better position references to accurately localize the rails than the predefined anchor lines. In addition, we present a challenging urban rail detection dataset DL-Rail with high-quality annotations and scenario diversity. DL-Rail contains 7000 pairs of images and annotations along with scene tags, and it is expected to encourage the development of rail detection. We extensively compare DALNet with many competitive lane methods. The results show that our DALNet achieves state-of-the-art performance on our DL-Rail rail detection dataset and the popular Tusimple and LLAMAS lane detection benchmarks. The code will be released at

Viaarxiv icon