Abstract:We investigate the problem of recovering a structured sparse signal from a linear observation model with an uncertain dynamic grid in the sensing matrix. The state-of-the-art expectation maximization based compressed sensing (EM-CS) methods, such as turbo compressed sensing (Turbo-CS) and turbo variational Bayesian inference (Turbo-VBI), have a relatively slow convergence speed due to the double-loop iterations between the E-step and M-step. Moreover, each inner iteration in the E-step involves a high-dimensional matrix inverse in general, which is unacceptable for problems with large signal dimensions or real-time calculation requirements. Although there are some attempts to avoid the high-dimensional matrix inverse by majorization minimization, the convergence speed and accuracy are often sacrificed. To better address this problem, we propose an alternating estimation framework based on a novel subspace constrained VBI (SC-VBI) method, in which the high-dimensional matrix inverse is replaced by a low-dimensional subspace constrained matrix inverse (with the dimension equal to the sparsity level). We further prove the convergence of the SC-VBI to a stationary solution of the Kullback-Leibler divergence minimization problem. Simulations demonstrate that the proposed SC-VBI algorithm can achieve a much better tradeoff between complexity per iteration, convergence speed, and performance compared to the state-of-the-art algorithms.
Abstract:Pilot pattern optimization in orthogonal frequency division multiplexing (OFDM) systems has been widely investigated due to its positive impact on channel estimation. In this paper, we consider the problem of multi-user pilot pattern optimization for OFDM systems. In particular, the goal is to enhance channel extrapolation performance for 5G NR systems by optimizing multi-user pilot patterns in frequency-domain. We formulate a novel pilot pattern optimization problem with the objective of minimizing the maximum integrated side-lobe level (ISL) among all users, subject to a statistical resolution limit (SRL) constraint. Unlike existing literature that only utilizes ISL for controlling side-lobe levels of the ambiguity function, we also leverage ISL to mitigate multi-user interference in code-domain multiplexing. Additionally, the introduced SRL constraint ensures sufficient delay resolution of the system to resolve multipath, thereby improving channel extrapolation performance. Then, we employ the estimation of distribution algorithm (EDA) to solve the formulated problem in an offline manner. Finally, we extend the formulated multi-user pilot pattern optimization problem to a multiband scenario, in which multiband gains can be exploited to improve system delay resolution. Simulation results demonstrate that the optimized pilot pattern yields significant performance gains in channel extrapolation over the conventional pilot patterns.
Abstract:In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multiple access channel (MAC) for cooperative target localization. To improve the localization performance, we present a hybrid information-signal domain cooperative sensing (HISDCS) design, where each sensing receiver transmits both the estimated time delay/effective reflecting coefficient and the received sensing signal sampled around the estimated time delay to the FC. Then, we propose to minimize the number of channel uses by utilizing an efficient Karhunen-Lo\'eve transformation (KLT) encoding scheme for signal quantization and proper node selection, under the Cram\'er-Rao lower bound (CRLB) constraint and the capacity limits of MAC. A novel matrix-inequality constrained successive convex approximation (MCSCA) algorithm is proposed to optimize the wireless backhaul resource allocation, together with a greedy strategy for node selection. Despite the high non-convexness of the considered problem, we prove that the proposed MCSCA algorithm is able to converge to the set of Karush-Kuhn-Tucker (KKT) solutions of a relaxed problem obtained by relaxing the discrete variables. Besides, a low-complexity quantization bit reallocation algorithm is designed, which does not perform explicit node selection, and is able to harvest most of the performance gain brought by HISDCS. Finally, numerical simulations are presented to show that the proposed HISDCS design is able to significantly outperform the baseline schemes.
Abstract:We investigate a joint visibility region (VR) detection and channel estimation problem in extremely large-scale multiple-input-multiple-output (XL-MIMO) systems, where near-field propagation and spatial non-stationary effects exist. In this case, each scatterer can only see a subset of antennas, i.e., it has a certain VR over the antennas. Because of the spatial correlation among adjacent sub-arrays, VR of scatterers exhibits a two-dimensional (2D) clustered sparsity. We design a 2D Markov prior model to capture such a structured sparsity. Based on this, a novel alternating maximum a posteriori (MAP) framework is developed for high-accuracy VR detection and channel estimation. The alternating MAP framework consists of three basic modules: a channel estimation module, a VR detection module, and a grid update module. Specifically, the first module is a low-complexity inverse-free variational Bayesian inference (IF-VBI) algorithm that avoids the matrix inverse via minimizing a relaxed Kullback-Leibler (KL) divergence. The second module is a structured expectation propagation (EP) algorithm which has the ability to deal with complicated prior information. And the third module refines polar-domain grid parameters via gradient ascent. Simulations demonstrate the superiority of the proposed algorithm in both VR detection and channel estimation.
Abstract:Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platform for developing embodied agents using LLMs and LMMs. LEGENT offers a dual approach: a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface, and a sophisticated data generation pipeline utilizing advanced algorithms to exploit supervision from simulated worlds at scale. In our experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks, showcasing promising generalization capabilities.
Abstract:This paper introduces a cooperative sensing framework designed for integrated sensing and communication cellular networks. The framework comprises one base station (BS) functioning as the sensing transmitter, while several nearby BSs act as sensing receivers. The primary objective is to facilitate cooperative target localization by enabling each receiver to share specific information with a fusion center (FC) over a limited capacity backhaul link. To achieve this goal, we propose an advanced cooperative sensing design that enhances the communication process between the receivers and the FC. Each receiver independently estimates the time delay and the reflecting coefficient associated with the reflected path from the target. Subsequently, each receiver transmits the estimated values and the received signal samples centered around the estimated time delay to the FC. To efficiently quantize the signal samples, a Karhunen-Lo\`eve Transform coding scheme is employed. Furthermore, an optimization problem is formulated to allocate backhaul resources for quantizing different samples, improving target localization. Numerical results validate the effectiveness of our proposed advanced design and demonstrate its superiority over a baseline design, where only the locally estimated values are transmitted from each receiver to the FC.
Abstract:While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. However, this method can be both resource and time-intensive, and not applicable to closed-source commercial LLMs. In this paper, we propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA), a method designed to augment the domain-specific capabilities of LLMs by leveraging insights from the response preference of expert models without requiring fine-tuning. Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks. Moreover, LLM with PANDA even outperforms the expert model that being learned on 4 tasks of ScienceWorld. This finding highlights the potential of exploring tuning-free approaches to achieve weak-to-strong generalization.
Abstract:The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents ($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we review the current agent research and highlight the neglected factors in existing agent benchmarks and method candidates. We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose an initial design of our agent, and benchmark its performance with several candidate baselines in the retrofitted WebShop. The extensive experimental results further prove the importance of the principles of $\mathbf{UA}^2$. Our research sheds light on the next steps of autonomous agent research with improved general problem-solving abilities.
Abstract:Federated learning (FL) is a machine learning paradigm where the clients possess decentralized training data and the central server handles aggregation and scheduling. Typically, FL algorithms involve clients training their local models using stochastic gradient descent (SGD), which carries drawbacks such as slow convergence and being prone to getting stuck in suboptimal solutions. In this work, we propose a message passing based Bayesian federated learning (BFL) framework to avoid these drawbacks.Specifically, we formulate the problem of deep neural network (DNN) learning and compression and as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an efficient BFL algorithm called EMTDAMP, where expectation maximization (EM) and turbo deep approximate message passing (TDAMP) are combined to achieve distributed learning and compression. The central server aggregates local posterior distributions to update global posterior distributions and update hyperparameters based on EM to accelerate convergence. The clients perform TDAMP to achieve efficient approximate message passing over DNN with joint prior distribution. We detail the application of EMTDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EMTDAMP.
Abstract:Knowledge-grounded dialogue (KGD) learns to generate an informative response based on a given dialogue context and external knowledge (\emph{e.g.}, knowledge graphs; KGs). Recently, the emergence of large language models (LLMs) and pre-training techniques has brought great success to knowledge-grounded dialogue. However, when building KGD systems in real applications, there are various real-world noises that are inevitable to face. For example, the dialogue context might involve perturbations such as misspellings and abbreviations. In addition, KGs typically suffer from incompletion and also might contain erroneous and outdated facts. Such real-world noises pose a challenge to the robustness of KGD systems and hinder their applications in the real world. In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. Specifically, we make use of the entity information in a KGD sample to create both its positive and negative samples which involve semantic-irrelevant and semantic-relevant perturbations, respectively. The contrastive learning framework ensures the KGD model is aware of these two types of perturbations, thus generating informative responses with the potentially noisy inputs in real applications. Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. Furthermore, we show that our method can generate better responses than comparison models in both the noisy and the few-shot settings.