Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Zhang

National Innovation Institute of Defense Technology, Chinese Academy of Military Science

WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

Sep 12, 2024

Jingwen Tong, Jiawei Shao, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang

Figure 1 for WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

Figure 2 for WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

Figure 3 for WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

Abstract:Wireless networks are increasingly facing challenges due to their expanding scale and complexity. These challenges underscore the need for advanced AI-driven strategies, particularly in the upcoming 6G networks. In this article, we introduce WirelessAgent, a novel approach leveraging large language models (LLMs) to develop AI agents capable of managing complex tasks in wireless networks. It can effectively improve network performance through advanced reasoning, multimodal data processing, and autonomous decision making. Thereafter, we demonstrate the practical applicability and benefits of WirelessAgent for network slicing management. The experimental results show that WirelessAgent is capable of accurately understanding user intent, effectively allocating slice resources, and consistently maintaining optimal performance.

Via

Access Paper or Ask Questions

Reinforcement-Learning-Enabled Beam Alignment for Water-Air Direct Optical Wireless Communications

Sep 05, 2024

Jiayue Liu, Tianqi Mao, Dongxuan He, Yang Yang, Zhen Gao, Dezhi Zheng, Jun Zhang

Figure 1 for Reinforcement-Learning-Enabled Beam Alignment for Water-Air Direct Optical Wireless Communications

Figure 2 for Reinforcement-Learning-Enabled Beam Alignment for Water-Air Direct Optical Wireless Communications

Figure 3 for Reinforcement-Learning-Enabled Beam Alignment for Water-Air Direct Optical Wireless Communications

Figure 4 for Reinforcement-Learning-Enabled Beam Alignment for Water-Air Direct Optical Wireless Communications

Abstract:The escalating interests on underwater exploration/reconnaissance applications have motivated high-rate data transmission from underwater to airborne relaying platforms, especially under high-sea scenarios. Thanks to its broad bandwidth and superior confidentiality, Optical wireless communication has become one promising candidate for water-air transmission. However, the optical signals inevitably suffer from deviations when crossing the highly-dynamic water-air interfaces in the absence of relaying ships/buoys. To address the issue, this article proposes one novel beam alignment strategy based on deep reinforcement learning (DRL) for water-air direct optical wireless communications. Specifically, the dynamic water-air interface is mathematically modeled using sea-wave spectrum analysis, followed by characterization of the propagation channel with ray-tracing techniques. Then the deep deterministic policy gradient (DDPG) scheme is introduced for DRL-based transceiving beam alignment. A logarithm-exponential (LE) nonlinear reward function with respect to the received signal strength is designed for high-resolution rewarding between different actions. Simulation results validate the superiority of the proposed DRL-based beam alignment scheme.

* 6 pages, 6 figures, published to ICCC 2024

Via

Access Paper or Ask Questions

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Aug 29, 2024

Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan

Abstract:Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.

* Project page: https://liuff19.github.io/ReconX

Via

Access Paper or Ask Questions

Exploring Selective Layer Fine-Tuning in Federated Learning

Aug 28, 2024

Yuchang Sun, Yuexiang Xie, Bolin Ding, Yaliang Li, Jun Zhang

Abstract:Federated learning (FL) has emerged as a promising paradigm for fine-tuning foundation models using distributed data in a privacy-preserving manner. Under limited computational resources, clients often find it more practical to fine-tune a selected subset of layers, rather than the entire model, based on their task-specific data. In this study, we provide a thorough theoretical exploration of selective layer fine-tuning in FL, emphasizing a flexible approach that allows the clients to adjust their selected layers according to their local data and resources. We theoretically demonstrate that the layer selection strategy has a significant impact on model convergence in two critical aspects: the importance of selected layers and the heterogeneous choices across clients. Drawing from these insights, we further propose a strategic layer selection method that utilizes local gradients and regulates layer selections across clients. The extensive experiments on both image and text datasets demonstrate the effectiveness of the proposed strategy compared with several baselines, highlighting its advances in identifying critical layers that adapt to the client heterogeneity and training dynamics in FL.

Via

Access Paper or Ask Questions

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

Aug 22, 2024

Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng

Abstract:Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models' encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences. Based on our proposed novel construction framework, we create 7,453,853 claim-evidence pairs and 553,117 QA pairs. We present numerous findings on model scale, conflict causes, and conflict types. We hope our ConflictBank benchmark will help the community better understand model behavior in conflicts and develop more reliable LLMs.

* Under Review

Via

Access Paper or Ask Questions

A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Aug 05, 2024

Guo-Yun Lin, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan, Jun Zhang

Figure 1 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Figure 2 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Figure 3 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Figure 4 for A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

Abstract:How to simultaneously locate multiple global peaks and achieve certain accuracy on the found peaks are two key challenges in solving multimodal optimization problems (MMOPs). In this paper, a landscape-aware differential evolution (LADE) algorithm is proposed for MMOPs, which utilizes landscape knowledge to maintain sufficient diversity and provide efficient search guidance. In detail, the landscape knowledge is efficiently utilized in the following three aspects. First, a landscape-aware peak exploration helps each individual evolve adaptively to locate a peak and simulates the regions of the found peaks according to search history to avoid an individual locating a found peak. Second, a landscape-aware peak distinction distinguishes whether an individual locates a new global peak, a new local peak, or a found peak. Accuracy refinement can thus only be conducted on the global peaks to enhance the search efficiency. Third, a landscape-aware reinitialization specifies the initial position of an individual adaptively according to the distribution of the found peaks, which helps explore more peaks. The experiments are conducted on 20 widely-used benchmark MMOPs. Experimental results show that LADE obtains generally better or competitive performance compared with seven well-performed algorithms proposed recently and four winner algorithms in the IEEE CEC competitions for multimodal optimization.

* under review

Via

Access Paper or Ask Questions

Task-Oriented Communication for Vehicle-to-Infrastructure Cooperative Perception

Jul 30, 2024

Jiawei Shao, Teng Li, Jun Zhang

Figure 1 for Task-Oriented Communication for Vehicle-to-Infrastructure Cooperative Perception

Figure 2 for Task-Oriented Communication for Vehicle-to-Infrastructure Cooperative Perception

Figure 3 for Task-Oriented Communication for Vehicle-to-Infrastructure Cooperative Perception

Figure 4 for Task-Oriented Communication for Vehicle-to-Infrastructure Cooperative Perception

Abstract:Vehicle-to-infrastructure (V2I) cooperative perception plays a crucial role in autonomous driving scenarios. Despite its potential to improve perception accuracy and robustness, the large amount of raw sensor data inevitably results in high communication overhead. To mitigate this issue, we propose TOCOM-V2I, a task-oriented communication framework for V2I cooperative perception, which reduces bandwidth consumption by transmitting only task-relevant information, instead of the raw data stream, for perceiving the surrounding environment. Our contributions are threefold. First, we propose a spatial-aware feature selection module to filter out irrelevant information based on spatial relationships and perceptual prior. Second, we introduce a hierarchical entropy model to exploit redundancy within the features for efficient compression and transmission. Finally, we utilize a scaled dot-product attention architecture to fuse vehicle-side and infrastructure-side features to improve perception performance. Experimental results demonstrate the effectiveness of TOCOM-V2I.

Via

Access Paper or Ask Questions

EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Jul 27, 2024

Shigang Liu, Di Cao, Junae Kim, Tamas Abraham, Paul Montague, Seyit Camtepe, Jun Zhang, Yang Xiang

Figure 1 for EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Figure 2 for EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Figure 3 for EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Figure 4 for EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Abstract:Recently, deep learning has demonstrated promising results in enhancing the accuracy of vulnerability detection and identifying vulnerabilities in software. However, these techniques are still vulnerable to attacks. Adversarial examples can exploit vulnerabilities within deep neural networks, posing a significant threat to system security. This study showcases the susceptibility of deep learning models to adversarial attacks, which can achieve 100% attack success rate (refer to Table 5). The proposed method, EaTVul, encompasses six stages: identification of important samples using support vector machines, identification of important features using the attention mechanism, generation of adversarial data based on these features using ChatGPT, preparation of an adversarial attack pool, selection of seed data using a fuzzy genetic algorithm, and the execution of an evasion attack. Extensive experiments demonstrate the effectiveness of EaTVul, achieving an attack success rate of more than 83% when the snippet size is greater than 2. Furthermore, in most cases with a snippet size of 4, EaTVul achieves a 100% attack success rate. The findings of this research emphasize the necessity of robust defenses against adversarial attacks in software vulnerability detection.

Via

Access Paper or Ask Questions

Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

Jul 15, 2024

Zhening Liu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

Abstract:With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo image compression architecture, named BiSIC. Specifically, we propose a 3D convolution based codec backbone to capture local features and incorporate bidirectional attention blocks to exploit global features. Moreover, we design a novel cross-dimensional entropy model that integrates various conditioning factors, including the spatial context, channel context, and stereo dependency, to effectively estimate the distribution of latent representations for entropy coding. Extensive experiments demonstrate that our proposed BiSIC outperforms conventional image/video compression standards, as well as state-of-the-art learning-based methods, in terms of both PSNR and MS-SSIM.

* ECCV 2024

Via

Access Paper or Ask Questions

Multimodal contrastive learning for spatial gene expression prediction using histology images

Jul 11, 2024

Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang

Figure 1 for Multimodal contrastive learning for spatial gene expression prediction using histology images

Figure 2 for Multimodal contrastive learning for spatial gene expression prediction using histology images

Figure 3 for Multimodal contrastive learning for spatial gene expression prediction using histology images

Figure 4 for Multimodal contrastive learning for spatial gene expression prediction using histology images

Abstract:In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H\&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose \textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of \textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

* BIB, Code: https://github.com/shizhiceng/mclSTExp

Via

Access Paper or Ask Questions