IEEE
Abstract:Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of prompt tuning, we enhance LLMs with a parameter-efficient design called bi-directional tuning for the capability in semi-autoregressive generation. Employing efficient tree-based decoding, the models perform draft candidate generation and verification in parallel, ensuring outputs identical to their autoregressive counterparts under greedy sampling. BiTA serves as a lightweight plug-in module, seamlessly boosting the inference efficiency of existing LLMs without requiring additional assistance models or incurring significant extra memory costs. Applying the proposed BiTA, LLaMA-2-70B-Chat achieves a 2.7$\times$ speedup on the MT-Bench benchmark. Extensive experiments confirm our method surpasses state-of-the-art acceleration techniques.
Abstract:Next-generation wireless networks are expected to utilize the limited radio frequency (RF) resources more efficiently with the aid of intelligent transceivers. To this end, we propose a promising transceiver architecture relying on stacked intelligent metasurfaces (SIM). An SIM is constructed by stacking an array of programmable metasurface layers, where each layer consists of a massive number of low-cost passive meta-atoms that individually manipulate the electromagnetic (EM) waves. By appropriately configuring the passive meta-atoms, an SIM is capable of accomplishing advanced computation and signal processing tasks, such as multiple-input multiple-output (MIMO) precoding/combining, multi-user interference mitigation, and radar sensing, as the EM wave propagates through the multiple layers of the metasurface, which effectively reduces both the RF-related energy consumption and processing delay. Inspired by this, we provide an overview of the SIM-aided MIMO transceiver design, which encompasses its hardware architecture and its potential benefits over state-of-the-art solutions. Furthermore, we discuss promising application scenarios and identify the open research challenges associated with the design of advanced SIM architectures for next-generation wireless networks. Finally, numerical results are provided for quantifying the benefits of wave-based signal processing in wireless systems.
Abstract:Integrating sensing functionalities is envisioned as a distinguishing feature of next-generation mobile networks, which has given rise to the development of a novel enabling technology -- \emph{Integrated Sensing and Communication (ISAC)}. Portraying the theoretical performance bounds of ISAC systems is fundamentally important to understand how sensing and communication functionalities interact (e.g., competitively or cooperatively) in terms of resource utilization, while revealing insights and guidelines for the development of effective physical-layer techniques. In this paper, we characterize the fundamental performance tradeoff between the detection probability for target monitoring and the user's achievable rate in ISAC systems. To this end, we first discuss the achievable rate of the user under sensing-free and sensing-interfered communication scenarios. Furthermore, we derive closed-form expressions for the probability of false alarm (PFA) and the successful probability of detection (PD) for monitoring the target of interest, where we consider both communication-assisted and communication-interfered sensing scenarios. In addition, the effects of the unknown channel coefficient are also taken into account in our theoretical analysis. Based on our analytical results, we then carry out a comprehensive assessment of the performance tradeoff between sensing and communication functionalities. Specifically, we formulate a power allocation problem to minimize the transmit power at the base station (BS) under the constraints of ensuring a required PD for perception as well as the communication user's quality of service requirement in terms of achievable rate. Finally, simulation results corroborate the accuracy of our theoretical analysis and the effectiveness of the proposed power allocation solutions.
Abstract:Conventional meta-atom designs rely heavily on researchers' prior knowledge and trial-and-error searches using full-wave simulations, resulting in time-consuming and inefficient processes. Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials. However, none of these algorithms are general enough to fulfill multi-objective tasks. Recently, deep learning methods represented by Generative Adversarial Networks (GANs) have been applied to inverse design of metamaterials, which can directly generate high-degree-of-freedom meta-atoms based on S-parameter requirements. However, the adversarial training process of GANs makes the network unstable and results in high modeling costs. This paper proposes a novel metamaterial inverse design method based on the diffusion probability theory. By learning the Markov process that transforms the original structure into a Gaussian distribution, the proposed method can gradually remove the noise starting from the Gaussian distribution and generate new high-degree-of-freedom meta-atoms that meet S-parameter conditions, which avoids the model instability introduced by the adversarial training process of GANs and ensures more accurate and high-quality generation results. Experiments have proven that our method is superior to representative methods of GANs in terms of model convergence speed, generation accuracy, and quality.
Abstract:The user-centric cell-free network has emerged as an appealing technology to improve the next-generation wireless network's capacity thanks to its ability to eliminate inter-cell interference effectively. However, the cell-free network inevitably brings in higher hardware cost and backhaul overhead as a larger number of base stations (BSs) are deployed. Additionally, severe channel fading in high-frequency bands constitutes another crucial issue that limits the practical application of the cell-free network. In order to address the above challenges, we amalgamate the cell-free system with another emerging technology, namely reconfigurable intelligent surface (RIS), which can provide high spectrum and energy efficiency with low hardware cost by reshaping the wireless propagation environment intelligently. To this end, we formulate a weighted sum-rate (WSR) maximization problem for RIS-assisted cell-free systems by jointly optimizing the BS precoding matrix and the RIS reflection coefficient vector. Subsequently, we transform the complicated WSR problem to a tractable optimization problem and propose a distributed cooperative alternating direction method of multipliers (ADMM) to fully utilize parallel computing resources. Inspired by the model-based algorithm unrolling concept, we unroll our solver to a learning-based deep distributed ADMM (D-ADMM) network framework. To improve the efficiency of the D-ADMM in distributed BSs, we develop a monodirectional information exchange strategy with a small signaling overhead. In addition to benefiting from domain knowledge, D-ADMM adaptively learns hyper-parameters and non-convex solvers of the intractable RIS design problem through data-driven end-to-end training.
Abstract:We consider the problem of channel estimation and joint active and passive beamforming for reconfigurable intelligent surface (RIS) assisted millimeter wave (mmWave) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems. We show that, with a well-designed frame-based training protocol, the received pilot signal can be organized into a low-rank third-order tensor that admits a canonical polyadic decomposition (CPD). Based on this observation, we propose two CPD-based methods for estimating the cascade channels associated with different subcarriers. The proposed methods exploit the intrinsic low-rankness of the CPD formulation, which is a result of the sparse scattering characteristics of mmWave channels, and thus have the potential to achieve a significant training overhead reduction. Specifically, our analysis shows that the proposed methods have a sample complexity that scales quadratically with the sparsity of the cascade channel. Also, by utilizing the singular value decomposition-like structure of the effective channel, this paper develops a joint active and passive beamforming method based on the estimated cascade channels. Simulation results show that the proposed CPD-based channel estimation methods attain mean square errors that are close to the Cramer-Rao bound (CRB) and present a clear advantage over the compressed sensing-based method. In addition, the proposed joint beamforming method can effectively utilize the estimated channel parameters to achieve superior beamforming performance.
Abstract:Federated learning (FL) is an emerging machine learning paradigm that allows to accomplish model training without aggregating data at a central server. Most studies on FL consider a centralized framework, in which a single server is endowed with a central authority to coordinate a number of devices to perform model training in an iterative manner. Due to stringent communication and bandwidth constraints, such a centralized framework has limited scalability as the number of devices grows. To address this issue, in this paper, we propose a ConFederated Learning (CFL) framework. The proposed CFL consists of multiple servers, in which each server is connected with an individual set of devices as in the conventional FL framework, and decentralized collaboration is leveraged among servers to make full use of the data dispersed throughout the network. We develop an alternating direction method of multipliers (ADMM) algorithm for CFL. The proposed algorithm employs a random scheduling policy which randomly selects a subset of devices to access their respective servers at each iteration, thus alleviating the need of uploading a huge amount of information from devices to servers. Theoretical analysis is presented to justify the proposed method. Numerical results show that the proposed method can converge to a decent solution significantly faster than gradient-based FL algorithms, thus boasting a substantial advantage in terms of communication efficiency.
Abstract:We consider the problem of spatial channel covariance matrix (CCM) estimation for intelligent reflecting surface (IRS)-assisted millimeter wave (mmWave) communication systems. Spatial CCM is essential for two-timescale beamforming in IRS-assisted systems; however, estimating the spatial CCM is challenging due to the passive nature of reflecting elements and the large size of the CCM resulting from massive reflecting elements of the IRS. In this paper, we propose a CCM estimation method by exploiting the low-rankness as well as the positive semi-definite (PSD) 3-level Toeplitz structure of the CCM. Estimation of the CCM is formulated as a semidefinite programming (SDP) problem and an alternating direction method of multipliers (ADMM) algorithm is developed. Our analysis shows that the proposed method is theoretically guaranteed to attain a reliable CCM estimate with a sample complexity much smaller than the dimension of the CCM. Thus the proposed method can help achieve a significant training overhead reduction. Simulation results are presented to illustrate the effectiveness of our proposed method and the performance of two-timescale beamforming scheme based on the estimated CCM.
Abstract:We consider the problem of downlink channel estimation for intelligent reflecting surface (IRS)-assisted millimeter Wave (mmWave) orthogonal frequency division multiplexing (OFDM) systems. By exploring the inherent sparse scattering characteristics of mmWave channels, we show that the received signals can be expressed as a low-rank third-order tensor that admits a tensor rank decomposition, also known as canonical polyadic decomposition (CPD). A structured CPD-based method is then developed to estimate the channel parameters. Our analysis reveals that the training overhead required by our proposed method is as low as O(U^2), where U denotes the sparsity of the cascade channel. Simulation results are provided to illustrate the efficiency of the proposed method.
Abstract:Reconfigurable intelligent surface (RIS) has recently emerged as a promising paradigm for future cellular networks. Specifically, due to its capability in reshaping the propagation environment, RIS was introduced to address the blockage issue in millimeter Wave (mmWave) or even Terahertz (THz) communications. The deployment of RIS, however, complicates the system architecture and poses a significant challenge for beam training (BT)/ beam alignment (BA), a process that is required to establish a reliable link between the transmitter and the receiver. In this article, we first review several state-of-the-art beam training solutions for RIS-assisted mmWave systems and discuss their respective advantages and limitations. We also present a new multi-directional BT method, which can achieve a decent BA performance with only a small amount of training overhead. Finally, we outline several important open issues in BT for RIS-assisted mmWave systems.