Henry
Abstract:This paper proposes an adaptive near-field beam training method to enhance performance in multi-user and multipath environments. The approach identifies multiple strongest beams through beam sweeping and linearly combines their received signals - capturing both amplitude and phase - for improved channel estimation. Two codebooks are considered: the conventional DFT codebook and a near-field codebook that samples both angular and distance domains. As the near-field basis functions are generally non-orthogonal and often over-complete, we exploit sparsity in the solution using LASSO-based linear regression, which can also suppress noise. Simulation results show that the near-field codebook reduces feedback overhead by up to 95% compared to the DFT codebook. The proposed LASSO regression method also maintains robustness under varying noise levels, particularly in low SNR regions. Furthermore, an off-grid refinement scheme is introduced to enhance accuracy especially when the codebook sampling is coarse, improving reconstruction accuracy by 69.4%.
Abstract:In this paper, we propose a novel intelligent polarforming antenna (IPA) to achieve cost-effective wireless sensing and communication. Specifically, the IPA can enable polarforming by adaptively controlling the antenna's polarization electrically as well as its position/rotation mechanically, so as to effectively exploit polarization and spatial diversity to reconfigure wireless channels for improving sensing and communication performance. We study an IPA-enhanced integrated sensing and communication (ISAC) system that utilizes user location sensing to facilitate communication between an IPA-equipped base station (BS) and IPA-equipped users. First, we model the IPA channel in terms of transceiver antenna polarforming vectors and antenna positions/rotations. We then propose a two-timescale ISAC protocol, where in the slow timescale, user localization is first performed, followed by the optimization of the BS antennas' positions and rotations based on the sensed user locations; subsequently, in the fast timescale, transceiver polarforming is adapted to cater to the instantaneous channel state information (CSI), with the optimized BS antennas' positions and rotations. We propose a new polarforming-based user localization method that uses a structured time-domain pattern of pilot-polarforming vectors to extract the common stable components in the IPA channel across different polarizations based on the parallel factor (PARAFAC) tensor model. Moreover, we maximize the achievable average sum-rate of users by jointly optimizing the fast-timescale transceiver polarforming, including phase shifts and amplitude variations, along with the slow-timescale antenna rotations and positions at the BS. Simulation results validate the effectiveness of polarforming-based localization algorithm and demonstrate the performance advantages of polarforming, antenna placement, and their joint design.




Abstract:Explainable disease diagnosis, which leverages patient information (e.g., signs and symptoms) and computational models to generate probable diagnoses and reasonings, offers clear clinical values. However, when clinical notes encompass insufficient evidence for a definite diagnosis, such as the absence of definitive symptoms, diagnostic uncertainty usually arises, increasing the risk of misdiagnosis and adverse outcomes. Although explicitly identifying and explaining diagnostic uncertainties is essential for trustworthy diagnostic systems, it remains under-explored. To fill this gap, we introduce ConfiDx, an uncertainty-aware large language model (LLM) created by fine-tuning open-source LLMs with diagnostic criteria. We formalized the task and assembled richly annotated datasets that capture varying degrees of diagnostic ambiguity. Evaluating ConfiDx on real-world datasets demonstrated that it excelled in identifying diagnostic uncertainties, achieving superior diagnostic performance, and generating trustworthy explanations for diagnoses and uncertainties. To our knowledge, this is the first study to jointly address diagnostic uncertainty recognition and explanation, substantially enhancing the reliability of automatic diagnostic systems.




Abstract:In this paper, we present a new form of backdoor attack against Large Language Models (LLMs): lingual-backdoor attacks. The key novelty of lingual-backdoor attacks is that the language itself serves as the trigger to hijack the infected LLMs to generate inflammatory speech. They enable the precise targeting of a specific language-speaking group, exacerbating racial discrimination by malicious entities. We first implement a baseline lingual-backdoor attack, which is carried out by poisoning a set of training data for specific downstream tasks through translation into the trigger language. However, this baseline attack suffers from poor task generalization and is impractical in real-world settings. To address this challenge, we design BadLingual, a novel task-agnostic lingual-backdoor, capable of triggering any downstream tasks within the chat LLMs, regardless of the specific questions of these tasks. We design a new approach using PPL-constrained Greedy Coordinate Gradient-based Search (PGCG) based adversarial training to expand the decision boundary of lingual-backdoor, thereby enhancing the generalization ability of lingual-backdoor across various tasks. We perform extensive experiments to validate the effectiveness of our proposed attacks. Specifically, the baseline attack achieves an ASR of over 90% on the specified tasks. However, its ASR reaches only 37.61% across six tasks in the task-agnostic scenario. In contrast, BadLingual brings up to 37.35% improvement over the baseline. Our study sheds light on a new perspective of vulnerabilities in LLMs with multilingual capabilities and is expected to promote future research on the potential defenses to enhance the LLMs' robustness




Abstract:Objectives: We aim to dynamically retrieve informative demonstrations, enhancing in-context learning in multimodal large language models (MLLMs) for disease classification. Methods: We propose a Retrieval-Augmented In-Context Learning (RAICL) framework, which integrates retrieval-augmented generation (RAG) and in-context learning (ICL) to adaptively select demonstrations with similar disease patterns, enabling more effective ICL in MLLMs. Specifically, RAICL examines embeddings from diverse encoders, including ResNet, BERT, BioBERT, and ClinicalBERT, to retrieve appropriate demonstrations, and constructs conversational prompts optimized for ICL. We evaluated the framework on two real-world multi-modal datasets (TCGA and IU Chest X-ray), assessing its performance across multiple MLLMs (Qwen, Llava, Gemma), embedding strategies, similarity metrics, and varying numbers of demonstrations. Results: RAICL consistently improved classification performance. Accuracy increased from 0.7854 to 0.8368 on TCGA and from 0.7924 to 0.8658 on IU Chest X-ray. Multi-modal inputs outperformed single-modal ones, with text-only inputs being stronger than images alone. The richness of information embedded in each modality will determine which embedding model can be used to get better results. Few-shot experiments showed that increasing the number of retrieved examples further enhanced performance. Across different similarity metrics, Euclidean distance achieved the highest accuracy while cosine similarity yielded better macro-F1 scores. RAICL demonstrated consistent improvements across various MLLMs, confirming its robustness and versatility. Conclusions: RAICL provides an efficient and scalable approach to enhance in-context learning in MLLMs for multimodal disease classification.




Abstract:Accurately modeling and forecasting complex systems governed by partial differential equations (PDEs) is crucial in various scientific and engineering domains. However, traditional numerical methods struggle in real-world scenarios due to incomplete or unknown physical laws. Meanwhile, machine learning approaches often fail to generalize effectively when faced with scarce observational data and the challenge of capturing local and global features. To this end, we propose the Physics-encoded Spectral Attention Network (PeSANet), which integrates local and global information to forecast complex systems with limited data and incomplete physical priors. The model consists of two key components: a physics-encoded block that uses hard constraints to approximate local differential operators from limited data, and a spectral-enhanced block that captures long-range global dependencies in the frequency domain. Specifically, we introduce a novel spectral attention mechanism to model inter-spectrum relationships and learn long-range spatial features. Experimental results demonstrate that PeSANet outperforms existing methods across all metrics, particularly in long-term forecasting accuracy, providing a promising solution for simulating complex systems with limited data and incomplete physics.
Abstract:Intelligent reflecting surface (IRS) is a promising paradigm to reconfigure the wireless environment for enhanced communication coverage and quality. However, to compensate for the double pathloss effect, massive IRS elements are required, raising concerns on the scalability of cost and complexity. This paper introduces a new architecture of quasi-static IRS (QS-IRS), which tunes element phases via mechanical adjustment or manually re-arranging the array topology. QS-IRS relies on massive production/assembly of purely passive elements only, and thus is suitable for ultra low-cost and large-scale deployment to enhance long-term coverage. To achieve this end, an IRS-aided area coverage problem is formulated, which explicitly considers the element radiation pattern (ERP), with the newly introduced shape masks for the mainlobe, and the sidelobe constraints to reduce energy leakage. An alternating optimization (AO) algorithm based on the difference-of-convex (DC) and successive convex approximation (SCA) procedure is proposed, which achieves shaped beamforming with power gains close to that of the joint optimization algorithm, but with significantly reduced computational complexity.




Abstract:High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from 1,024 $\times$ 1,024 to 35,503 $\times$ 26,627. HRScene is collected and re-annotated by 10 graduate-level annotators, covering 25 scenarios, ranging from microscopic to radiology images, street views, long-range pictures, and telescope images. It includes HRIs of real-world objects, scanned documents, and composite multi-image. The two diagnostic evaluation datasets are synthesized by combining the target image with the gold answer and distracting images in different orders, assessing how well models utilize regions in HRI. We conduct extensive experiments involving 28 VLMs, including Gemini 2.0 Flash and GPT-4o. Experiments on HRScene show that current VLMs achieve an average accuracy of around 50% on real-world tasks, revealing significant gaps in HRI understanding. Results on synthetic datasets reveal that VLMs struggle to effectively utilize HRI regions, showing significant Regional Divergence and lost-in-middle, shedding light on future research.
Abstract:Six-dimensional movable antenna (6DMA) is a promising technology to fully exploit spatial variation in wireless channels by allowing flexible adjustment of three-dimensional (3D) positions and rotations of antennas at the transceiver. In this paper, we investigate the practical low-complexity design of 6DMA-enabled communication systems, including transmission protocol, statistical channel information (SCI) acquisition, and joint position and rotation optimization of 6DMA surfaces based on the SCI of users. Specifically, an orthogonal matching pursuit (OMP)-based algorithm is proposed for the estimation of SCI of users at all possible position-rotation pairs of 6DMA surfaces based on the channel measurements at a small subset of position-rotation pairs. Then, the average sum logarithmic rate of all users is maximized by jointly designing the positions and rotations of 6DMA surfaces based on their SCI acquired. Different from prior works on 6DMA which adopt alternating optimization to design 6DMA positions/rotations with iterations, we propose a new sequential optimization approach that first determines 6DMA rotations and then finds their feasible positions to realize the optimized rotations subject to practical antenna placement constraints. Simulation results show that the proposed sequential optimization significantly reduces the computational complexity of conventional alternating optimization, while achieving comparable communication performance. It is also shown that the proposed SCI-based 6DMA design can effectively enhance the communication throughput of wireless networks over existing fixed (position and rotation) antenna arrays, yet with a practically appealing low-complexity implementation.




Abstract:Integrated sensing and communication (ISAC) is one of the key usage scenarios for future sixth-generation (6G) mobile communication networks, where communication and sensing (C&S) services are simultaneously provided through shared wireless spectrum, signal processing modules, hardware, and network infrastructure. Such an integration is strengthened by the technology trends in 6G, such as denser network nodes, larger antenna arrays, wider bandwidths, higher frequency bands, and more efficient utilization of spectrum and hardware resources, which incentivize and empower enhanced sensing capabilities. As the dominant waveform used in contemporary communication systems, orthogonal frequency division multiplexing (OFDM) is still expected to be a very competitive technology for 6G, rendering it necessary to thoroughly investigate the potential and challenges of OFDM ISAC. Thus, this paper aims to provide a comprehensive tutorial overview of ISAC systems enabled by large-scale multi-input multi-output (MIMO) and OFDM technologies and to discuss their fundamental principles, advantages, and enabling signal processing methods. To this end, a unified MIMO-OFDM ISAC system model is first introduced, followed by four frameworks for estimating parameters across the spatial, delay, and Doppler domains, including parallel one-domain, sequential one-domain, joint two-domain, and joint three-domain parameter estimation. Next, sensing algorithms and performance analyses are presented in detail for far-field scenarios where uniform plane wave (UPW) propagation is valid, followed by their extensions to near-field scenarios where uniform spherical wave (USW) characteristics need to be considered. Finally, this paper points out open challenges and outlines promising avenues for future research on MIMO-OFDM ISAC.