Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qian Gao

Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration

May 24, 2026

Yuanzhi Xu, Qian Gao, Jun Fan, Guohui Ding, Zhenyu Yang, Sixue Lin, Yuteng Xiao

Abstract:The generation of factually incorrect objects, commonly known as object hallucination, remains a persistent challenge in Large Vision-Language Models (LVLMs). Current approaches to address this issue - ranging from expensive data-driven fine-tuning and high-latency contrastive decoding to rigid attention head truncation - frequently compromise either computational efficiency or the continuity of the model's feature space. To overcome these limitations, we introduce a novel, training-free inference strategy that operates as a region-aware adaptive weighting mechanism to dynamically correct semantic drift without relying on abrupt heuristic truncations. By computing an outlier-resistant statistical midpoint across various attention heads, we establish a stable anchor for reliable visual representations. We then utilize the inter-head disagreement mapped across regions to dynamically determine intervention budgets, gently suppressing hallucination-inducing attention paths through a continuous penalty modulation. This recalibration process effectively rectifies visual-semantic misalignments while fully preserving generative fluency and language priors. Comprehensive evaluations on standard multimodal benchmarks, including CHAIR, POPE, and MME, reveal that our strategy substantially curtails both instance- and sentence-level hallucinations. The results demonstrate state-of-the-art performance against contemporary baselines, confirming our method's efficiency and algorithmic robustness. Our code will be public.

Via

Access Paper or Ask Questions

LLM-enabled Antenna Partitioning and Beamforming Optimization for Segmented Pinching

Apr 11, 2026

Qian Gao, Ruikang Zhong, Hyundong Shin, Yuanwei Liu

Abstract:Integrated sensing and communication (ISAC) requires spatial architectures that can flexibly balance data transmission and environment sensing. Segmented pinching antenna-assisted ISAC provides such flexibility by allowing different waveguide segments to be dynamically configured for transmission and reception. However, its design involves the joint optimization of antenna deployment, segment partitioning, and beamforming under coupled communication and sensing constraints, which becomes particularly challenging when the numbers of communication users and sensing targets vary across scenarios. To endow the system with stronger adaptability to changing user and target configurations, we propose a general learning framework for segmented pinching antenna-assisted ISAC systems. Specifically, a channel state information (CSI)-induced self-graph is constructed to produce permutation-invariant representations of user-target interactions, and the resulting features are processed by a large language model (LLM) backbone with two task-specific heads for jointly predicting antenna deployment, segment partitioning, and ISAC beamforming. In addition, a user count transfer mechanism is developed to examine whether the learned deployment policy is site-specific and reusable under changed user configurations. Simulation results show that the proposed framework achieves higher communication rates while maintaining reliable sensing accuracy. Moreover, the learned deployment policy remains highly stable when transferring to other user counts, which reduces the training cost from full model retraining to beamforming head adaption.

* arXiv admin note: substantial text overlap with arXiv:2604.10256. arXiv admin note: substantial text overlap with arXiv:2604.10256

Via

Access Paper or Ask Questions

Graph-Enhanced LLM for SWAN-ISAC

Apr 11, 2026

Qian Gao, Ruikang Zhong, Yuanwei Liu

Abstract:Segmented pinching antenna assisted integrated sensing and communication (ISAC) systems enable flexible spatial resource utilization by allowing different waveguide segments to be dynamically configured for transmission and reception. However, the resulting design requires the joint optimization of antenna deployment, segment partitioning, and beamforming under coupled communication and sensing constraints. In this paper, we propose a general learning framework for segmented pinching antenna assisted ISAC systems. Specifically, a channel state information (CSI)-induced self-graph is constructed to capture the scenario-dependent interactions among communication users and sensing targets. Based on the learned graph representation, a large language model (LLM) backbone with low-rank adaptation (LoRA) is employed, followed by two task-specific output heads for antenna deployment and beamforming prediction, respectively. Simulation results show that the proposed framework achieves a favorable tradeoff between communication rate and sensing accuracy

Via

Access Paper or Ask Questions

Active Inference for Micro-Gesture Recognition: EFE-Guided Temporal Sampling and Adaptive Learning

Mar 08, 2026

Weijia Feng, Jingyu Yang, Ruojia Zhang, Fengtao Sun, Qian Gao, Chenyang Wang, Tongtong Su, Jia Guo, Xiaobai Li, Minglai Shao

Abstract:Micro-gestures are subtle and transient movements triggered by unconscious neural and emotional activities, holding great potential for human-computer interaction and clinical monitoring. However, their low amplitude, short duration, and strong inter-subject variability make existing deep models prone to degradation under low-sample, noisy, and cross-subject conditions. This paper presents an active inference-based framework for micro-gesture recognition, featuring Expected Free Energy (EFE)-guided temporal sampling and uncertainty-aware adaptive learning. The model actively selects the most discriminative temporal segments under EFE guidance, enabling dynamic observation and information gain maximization. Meanwhile, sample weighting driven by predictive uncertainty mitigates the effects of label noise and distribution shift. Experiments on the SMG dataset demonstrate the effectiveness of the proposed method, achieving consistent improvements across multiple mainstream backbones. Ablation studies confirm that both the EFE-guided observation and the adaptive learning mechanism are crucial to the performance gains. This work offers an interpretable and scalable paradigm for temporal behavior modeling under low-resource and noisy conditions, with broad applicability to wearable sensing, HCI, and clinical emotion monitoring.

* 10 pages, accepted by CVPR 2026

Via

Access Paper or Ask Questions

RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems

Jan 28, 2026

Qian Gao, Ruikang Zhong, Yue Liu, Hyundong Shin, Yuanwei Liu

Abstract:In this paper, a three-dimensional (3D) deployment scheme of pinching antenna array is proposed, aiming to enhances the performance of integrated sensing and communication (ISAC) systems. To fully realize the potential of 3D deployment, a joint antenna positioning, time allocation and transmit power optimization problem is formulated to maximize the sum communication rate with the constraints of target sensing rates and system energy. To solve the sum rate maximization problem, we propose a heterogeneous graph neural network based reinforcement learning (HGRL) algorithm. Simulation results prove that 3D deployment of pinching antenna array outperforms 1D and 2D counterparts in ISAC systems. Moreover, the proposed HGRL algorithm surpasses other baselines in both performance and convergence speed due to the advanced observation construction of the environment.

Via

Access Paper or Ask Questions

Deep Learning based Three-stage Solution for ISAC Beamforming Optimization

Jan 28, 2026

Qian Gao, Ruikang Zhong, Yuanwei Liu

Abstract:In this paper, a general ISAC system where the base station (BS) communicates with multiple users and performs target detection is considered. Then, a sum communication rate maximization problem is formulated, subjected to the constraints of transmit power and the minimum sensing rates of users. To solve this problem, we develop a framework that leverages deep learning algorithms to provide a three-stage solution for ISAC beamforming. The three-stage beamforming optimization solution includes three modules: 1) an unsupervised learning based feature extraction algorithm is proposed to extract fixed-size latent features while keeping its essential information from the variable channel state information (CSI); 2) a reinforcement learning (RL) based beampattern optimization algorithm is proposed to search the desired beampattern according to the extracted features; 3) a supervised learning based beamforming reconstruction algorithm is proposed to reconstruct the beamforming vector from beampattern given by the RL agent. Simulation results demonstrate that the proposed three-stage solution outperforms the baseline RL algorithm by optimizing the intuitional beampattern rather than beamforming.

Via

Access Paper or Ask Questions

GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Feb 17, 2025

Zhuoning Guo, Guangxing Chen, Qian Gao, Xiaochao Liao, Jianjia Zheng, Lu Shen, Hao Liu

Figure 1 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Figure 2 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Figure 3 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Figure 4 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Abstract:Web recommendations provide personalized items from massive catalogs for users, which rely heavily on retrieval stages to trade off the effectiveness and efficiency of selecting a small relevant set from billion-scale candidates in online digital platforms. As one of the largest Chinese search engine and news feed providers, Baidu resorts to Deep Neural Network (DNN) and graph-based Approximate Nearest Neighbor Search (ANNS) algorithms for accurate relevance estimation and efficient search for relevant items. However, current retrieval at Baidu fails in comprehensive user-item relational understanding due to dissected interaction modeling, and performs inefficiently in large-scale graph-based ANNS because of suboptimal traversal navigation and the GPU computational bottleneck under high concurrency. To this end, we propose a GPU-accelerated Multi-relational Parallel Graph Retrieval (GMP-GR) framework to achieve effective yet efficient retrieval in web-scale recommendations. First, we propose a multi-relational user-item relevance metric learning method that unifies diverse user behaviors through multi-objective optimization and employs a self-covariant loss to enhance pathfinding performance. Second, we develop a hierarchical parallel graph-based ANNS to boost graph retrieval throughput, which conducts breadth-depth-balanced searches on a large-scale item graph and cost-effectively handles irregular neural computation via adaptive aggregation on GPUs. In addition, we integrate system optimization strategies in the deployment of GMP-GR in Baidu. Extensive experiments demonstrate the superiority of GMP-GR in retrieval accuracy and efficiency. Deployed across more than twenty applications at Baidu, GMP-GR serves hundreds of millions of users with a throughput exceeding one hundred million requests per second.

Via

Access Paper or Ask Questions

Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

Sep 03, 2024

Xiaodi Li, Jianfeng Gui, Qian Gao, Haoyuan Shi, Zhenyu Yue

Figure 1 for Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

Figure 2 for Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

Figure 3 for Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

Figure 4 for Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

Abstract:Graph Neural Networks have been widely applied in critical decision-making areas that demand interpretable predictions, leading to the flourishing development of interpretability algorithms. However, current graph interpretability algorithms tend to emphasize generality and often overlook biological significance, thereby limiting their applicability in predicting cancer drug responses. In this paper, we propose a novel post-hoc interpretability algorithm for cancer drug response prediction, CETExplainer, which incorporates a controllable edge-type-specific weighting mechanism. It considers the mutual information between subgraphs and predictions, proposing a structural scoring approach to provide fine-grained, biologically meaningful explanations for predictive models. We also introduce a method for constructing ground truth based on real-world datasets to quantitatively evaluate the proposed interpretability algorithm. Empirical analysis on the real-world dataset demonstrates that CETExplainer achieves superior stability and improves explanation quality compared to leading algorithms, thereby offering a robust and insightful tool for cancer drug prediction.

Via

Access Paper or Ask Questions

DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

Aug 22, 2024

Haoyuan Shi, Tao Xu, Xiaodi Li, Qian Gao, Junfeng Xia, Zhenyu Yue

Figure 1 for DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

Figure 2 for DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

Figure 3 for DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

Figure 4 for DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

Abstract:Predicting the response of a cancer cell line to a therapeutic drug is pivotal for personalized medicine. Despite numerous deep learning methods that have been developed for drug response prediction, integrating diverse information about biological entities and predicting the directional response remain major challenges. Here, we propose a novel interpretable predictive model, DRExplainer, which leverages a directed graph convolutional network to enhance the prediction in a directed bipartite network framework. DRExplainer constructs a directed bipartite network integrating multi-omics profiles of cell lines, the chemical structure of drugs and known drug response to achieve directed prediction. Then, DRExplainer identifies the most relevant subgraph to each prediction in this directed bipartite network by learning a mask, facilitating critical medical decision-making. Additionally, we introduce a quantifiable method for model interpretability that leverages a ground truth benchmark dataset curated from biological features. In computational experiments, DRExplainer outperforms state-of-the-art predictive methods and another graph-based explanation method under the same experimental setting. Finally, the case studies further validate the interpretability and the effectiveness of DRExplainer in predictive novel drug response. Our code is available at: https://github.com/vshy-dream/DRExplainer.

Via

Access Paper or Ask Questions

Autosen: improving automatic wifi human sensing through cross-modal autoencoder

Jan 08, 2024

Qian Gao, Yanling Hao, Yuanwei Liu

Abstract:WiFi human sensing is highly regarded for its low-cost and privacy advantages in recognizing human activities. However, its effectiveness is largely confined to controlled, single-user, line-of-sight settings, limited by data collection complexities and the scarcity of labeled datasets. Traditional cross-modal methods, aimed at mitigating these limitations by enabling self-supervised learning without labeled data, struggle to extract meaningful features from amplitude-phase combinations. In response, we introduce AutoSen, an innovative automatic WiFi sensing solution that departs from conventional approaches. AutoSen establishes a direct link between amplitude and phase through automated cross-modal autoencoder learning. This autoencoder efficiently extracts valuable features from unlabeled CSI data, encompassing amplitude and phase information while eliminating their respective unique noises. These features are then leveraged for specific tasks using few-shot learning techniques. AutoSen's performance is rigorously evaluated on a publicly accessible benchmark dataset, demonstrating its exceptional capabilities in automatic WiFi sensing through the extraction of comprehensive cross-modal features.

Via

Access Paper or Ask Questions