Henry




Abstract:Movable antenna (MA) technology offers promising potential to enhance wireless communication by allowing flexible antenna movement. To maximize spatial degrees of freedom (DoFs), larger movable regions are required, which may render the conventional far-field assumption for channels between transceivers invalid. In light of it, we investigate in this paper MA-enabled near-field communications, where a base station (BS) with multiple movable subarrays serves multiple users, each equipped with a fixed-position antenna (FPA). First, we extend the field response channel model for MA systems to the near-field propagation scenario. Next, we examine MA-aided multiuser communication systems under both digital and analog beamforming architectures. For digital beamforming, spatial division multiple access (SDMA) is utilized, where an upper bound on the minimum signal-to-interference-plus-noise ratio (SINR) across users is derived in closed form. A low-complexity algorithm based on zero-forcing (ZF) is then proposed to jointly optimize the antenna position vector (APV) and digital beamforming matrix (DBFM) to approach this bound. For analog beamforming, orthogonal frequency division multiple access (OFDMA) is employed, and an upper bound on the minimum signal-to-noise ratio (SNR) among users is derived. An alternating optimization (AO) algorithm is proposed to iteratively optimize the APV, analog beamforming vector (ABFV), and power allocation until convergence. For both architectures, we further explore MA design strategies based on statistical channel state information (CSI), with the APV updated less frequently to reduce the antenna movement overhead. Simulation results demonstrate that our proposed algorithms achieve performance close to the derived bounds and also outperform the benchmark schemes using dense or sparse arrays with FPAs.




Abstract:Movable antenna (MA) is an emerging technology that can significantly improve communication performance via the continuous adjustment of the antenna positions. To unleash the potential of MAs in wideband communication systems, acquiring accurate channel state information (CSI), i.e., the channel frequency responses (CFRs) between any position pair within the transmit (Tx) region and the receive (Rx) region across all subcarriers, is a crucial issue. In this paper, we study the channel estimation problem for wideband MA systems. To start with, we express the CFRs as a combination of the field-response vectors (FRVs), delay-response vector (DRV), and path-response tensor (PRT), which exhibit sparse characteristics and can be recovered by using a limited number of channel measurements at selected position pairs of Tx and Rx MAs over a few subcarriers. Specifically, we first formulate the recovery of the FRVs and DRV as a problem with multiple measurement vectors in compressed sensing (MMV-CS), which can be solved via a simultaneous orthogonal matching pursuit (SOMP) algorithm. Next, we estimate the PRT using the least-square (LS) method. Moreover, we also devise an alternating refinement approach to further improve the accuracy of the estimated FRVs, DRV, and PRT. This is achieved by minimizing the discrepancy between the received pilots and those constructed by the estimated CSI, which can be efficiently carried out by using the gradient descent algorithm. Finally, simulation results demonstrate that both the SOMP-based channel estimation method and alternating refinement method can reconstruct the complete wideband CSI with high accuracy, where the alternating refinement method performs better despite a higher complexity.




Abstract:Six-dimensional movable antenna (6DMA) is an innovative technology to improve wireless network capacity by adjusting 3D positions and 3D rotations of antenna surfaces based on channel spatial distribution. However, the existing works on 6DMA have assumed a central processing unit (CPU) to jointly process the signals of all 6DMA surfaces to execute various tasks. This inevitably incurs prohibitively high processing cost for channel estimation. Therefore, we propose a distributed 6DMA processing architecture to reduce processing complexity of CPU by equipping each 6DMA surface with a local processing unit (LPU). In particular, we unveil for the first time a new \textbf{\textit{directional sparsity}} property of 6DMA channels, where each user has significant channel gains only for a (small) subset of 6DMA position-rotation pairs, which can receive direct/reflected signals from users. In addition, we propose a practical three-stage protocol for the 6DMA-equipped base station (BS) to conduct statistical CSI acquisition for all 6DMA candidate positions/rotations, 6DMA position/rotation optimization, and instantaneous channel estimation for user data transmission with optimized 6DMA positions/rotations. Specifically, the directional sparsity is leveraged to develop distributed algorithms for joint sparsity detection and channel power estimation, as well as for directional sparsity-aided instantaneous channel estimation. Using the estimated channel power, we develop a channel power-based optimization algorithm to maximize the ergodic sum rate of the users by optimizing the antenna positions/rotations. Simulation results show that our channel estimation algorithms are more accurate than benchmarks with lower pilot overhead, and our optimization outperforms fluid/movable antennas optimized only in two dimensions (2D), even when the latter have perfect instantaneous CSI.




Abstract:This paper presents, for the first time, the concept of \textit{polarforming} for wireless communications. Polarforming refers to a novel technique that enables dynamic adjustment of antenna polarization using reconfigurable polarized antennas (RPAs). It can fully leverage polarization diversity to improve the performance of wireless communication systems by aligning the effective polarization state of the incoming electromagnetic (EM) wave with the antenna polarization. To better demonstrate the benefits of polarforming, we propose a general RPA-aided system that allows for tunable antenna polarization. A wavefront-based channel model is developed to properly capture depolarization behaviors in both line-of-sight (LoS) and non-line-of-sight (NLoS) channels. Based on this model, we provide a detailed description of transmit and receive polarforming on planes of polarization (PoPs). We also evaluate the performance gains provided by polarforming under stochastic channel conditions. Specifically, we derive a closed-form expression for the relative signal-to-noise ratio (SNR) gain compared to conventional fixed-polarization antenna (FPA) systems and approximate the cumulative distribution function (CDF) for the RPA system. Our analysis reveals that polarforming offers a diversity gain of two, indicating full utilization of polarization diversity for dual-polarized antennas. Furthermore, extensive simulation results validate the effectiveness of polarforming and exhibit substantial improvements over conventional FPA systems. The results also indicate that polarforming not only can combat depolarization effects caused by wireless channels but also can overcome channel correlation when scattering is insufficient.
Abstract:Multimodal Large Language Models (MLLMs) have demonstrated great zero-shot performance on visual question answering (VQA). However, when it comes to knowledge-based VQA (KB-VQA), MLLMs may lack human commonsense or specialized domain knowledge to answer such questions and require obtaining necessary information from external knowledge sources. Previous works like Retrival-Augmented VQA-v2 (RAVQA-v2) focus on utilizing as much input information, such as image-based textual descriptions and retrieved knowledge, as possible to improve performance, but they all overlook the issue that with the number of input tokens increasing, inference efficiency significantly decreases, which contradicts the demands of practical applications. To address this issue, we propose Retrieval-Augmented MLLM with Compressed Contexts (RACC). RACC learns to compress and aggregate retrieved contexts, from which it generates a compact modulation in the form of Key-Value (KV) cache. This modulation is then used to adapt the downstream frozen MLLM, thereby achieving effective and efficient inference. RACC achieves a state-of-the-art (SOTA) performance of 62.9% on OK-VQA. Moreover, it significantly reduces inference latency by 22.0%-59.7% compared to the prominent RAVQA-v2. Abundant experiments show RACC's broad applicability. It is compatible with various off-the-shelf MLLMs and can also handle different knowledge sources including textual and multimodal documents.




Abstract:Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly sensitive to the distribution of training data, thereby causing the unfairness in formula recognition evaluation. To this end, we propose a Character Detection Matching (CDM) metric, ensuring the evaluation objectivity by designing a image-level rather than LaTex-level metric score. Specifically, CDM renders both the model-predicted LaTeX and the ground-truth LaTeX formulas into image-formatted formulas, then employs visual feature extraction and localization techniques for precise character-level matching, incorporating spatial position information. Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching. Experimentally, we evaluated various formula recognition models using CDM, BLEU, and ExpRate metrics. Their results demonstrate that the CDM aligns more closely with human evaluation standards and provides a fairer comparison across different models by eliminating discrepancies caused by diverse formula representations.


Abstract:With a great potential of improving the service fairness and quality for user equipments (UEs), cell-free massive multiple-input multiple-output (mMIMO) has been regarded as an emerging candidate for 6G network architectures. Under ideal assumptions, the coherent joint transmission (CJT) serving mode has been considered as an optimal option for cell-free mMIMO systems, since it can achieve coherent cooperation gain among the access points. However, when considering the limited fronthaul constraint in practice, the non-coherent joint transmission (NCJT) serving mode is likely to outperform CJT, since the former requires much lower fronthaul resources. In other words, the performance excellence and worseness of single serving mode (CJT or NCJT) depends on the fronthaul capacity, and any single transmission mode cannot perfectly adapt the capacity limited fronthaul. To explore the performance potential of the cell-free mMIMO system with limited fronthauls by harnessing the merits of CJT and NCJT, we propose a CJT-NCJT hybrid serving mode framework, in which UEs are allocated to operate on CJT or NCJT serving mode. To improve the sum-rate of the system with low complexity, we first propose a probability-based random serving mode allocation scheme. With a given serving mode, a successive convex approximation-based power allocation algorithm is proposed to maximize the system's sum-rate. Simulation results demonstrate the superiority of the proposed scheme.
Abstract:Six-dimensional movable antenna (6DMA) is an emerging technology that is able to fully exploit the spatial variation of wireless channels by controlling the 3D positions and 3D rotations of distributed antennas/antenna surfaces at the transmitter/receiver. In this letter, we apply 6DMA at the base station (BS) to enhance its wireless sensing performance over a given set of regions. To this end, we first divide each region into a number of equal-size subregions and select one typical target location within each subregion. Then, we derive an expression for the Cramer-Rao bound (CRB) for estimating the directions of arrival (DoAs) from these typical target locations in all regions, which sheds light on the sensing performance of 6DMA enhanced systems in terms of a power gain and a geometric gain. Next, we minimize the CRB for DoA estimation via jointly optimizing the positions and rotations of all 6DMAs at the BS, subject to practical movement constraints, and propose an efficient algorithm to solve the resulting non-convex optimization problem sub-optimally. Finally, simulation results demonstrate the significant improvement in DoA estimation accuracy achieved by the proposed 6DMA sensing scheme as compared to various benchmark schemes, for both isotropic and directive antenna radiation patterns.




Abstract:Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features attend to multiple or incorrect prompt tokens. In this work, we propose novel Semantic Protection Diffusion (SPDiffusion) to protect the semantics of regions from the influence of irrelevant tokens, eliminating the confusion of non-corresponding attributes. In the SPDiffusion framework, we design a Semantic Protection Mask (SP-Mask) to represent the relevance of the regions and the tokens, and propose a Semantic Protection Cross-Attention (SP-Attn) to shield the influence of irrelevant tokens on specific regions in the generation process. To evaluate our method, we created a diverse multi-concept benchmark, and SPDiffusion achieves state-of-the-art results on this benchmark, proving its effectiveness. Our method can be combined with many other application methods or backbones, such as ControlNet, Story Diffusion, PhotoMaker and PixArt-alpha to enhance their multi-concept capabilities, demonstrating strong compatibility and scalability.
Abstract:Automatic disease diagnosis has become increasingly valuable in clinical practice. The advent of large language models (LLMs) has catalyzed a paradigm shift in artificial intelligence, with growing evidence supporting the efficacy of LLMs in diagnostic tasks. Despite the growing attention in this field, many critical research questions remain under-explored. For instance, what diseases and LLM techniques have been investigated for diagnostic tasks? How can suitable LLM techniques and evaluation methods be selected for clinical decision-making? To answer these questions, we performed a comprehensive analysis of LLM-based methods for disease diagnosis. This scoping review examined the types of diseases, associated organ systems, relevant clinical data, LLM techniques, and evaluation methods reported in existing studies. Furthermore, we offered guidelines for data preprocessing and the selection of appropriate LLM techniques and evaluation strategies for diagnostic tasks. We also assessed the limitations of current research and delineated the challenges and future directions in this research field. In summary, our review outlined a blueprint for LLM-based disease diagnosis, helping to streamline and guide future research endeavors.