Abstract:This paper studies a passive source localization system, where a single base station (BS) is employed to estimate the positions and attitudes of multiple mobile stations (MSs). The BS and the MSs are equipped with uniform rectangular arrays, and the MSs are located in the near-field region of the BS array. To avoid the difficulty of tackling the problem directly based on the near-field signal model, we establish a subarray-wise far-field received signal model. In this model, the entire BS array is divided into multiple subarrays to ensure that each MS is in the far-field region of each BS subarray. By exploiting the angles of arrival (AoAs) of an MS antenna at different BS subarrays, we formulate the attitude and location estimation problem under the Bayesian inference framework. Based on the factor graph representation of the probabilistic problem model, a message passing algorithm named array partitioning based pose and location estimation (APPLE) is developed to solve this problem. An estimation-error lower bound is obtained as a performance benchmark of the proposed algorithm. Numerical results demonstrate that the proposed APPLE algorithm outperforms other baseline methods in the accuracy of position and attitude estimation.
Abstract:Semantic communication leverages artificial intelligence (AI) technologies to extract semantic information from data for efficient transmission, theraby significantly reducing communication cost. With the evolution towards artificial general intelligence (AGI), the increasing demands for AGI services pose new challenges to semantic communication. In response, we propose a new paradigm for AGI-driven communications, called generative semantic communication (GSC), which utilizes advanced AI technologies such as foundation models and generative models. We first describe the basic concept of GSC and its difference from existing semantic communications, and then introduce a general framework of GSC, followed by two case studies to verify the advantages of GSC in AGI-driven applications. Finally, open challenges and new research directions are discussed to stimulate this line of research and pave the way for practical applications.
Abstract:Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in largely underdetermined scenarios. On the other hand, score-based generative modeling offers a promising way to accurately characterize the sophisticated image distribution. In this paper, by exploiting the close relation between score-based modeling and empirical Bayes-optimal denoising, we devise a message passing framework that integrates a score-based minimum mean squared error (MMSE) denoiser for compressive image recovery. This framework is firmly rooted in Bayesian formalism, in which state evolution (SE) equations accurately predict its asymptotic performance. Experiments on the FFHQ dataset demonstrate that our method strikes a significantly better performance-complexity tradeoff than conventional message passing, regularized linear regression, and score-based posterior sampling baselines. Remarkably, our method typically requires less than 20 neural function evaluations (NFEs) to converge.
Abstract:The distortion-perception (DP) tradeoff reveals a fundamental conflict between distortion metrics (e.g., MSE and PSNR) and perceptual quality. Recent research has increasingly concentrated on evaluating denoising algorithms within the DP framework. However, existing algorithms either prioritize perceptual quality by sacrificing acceptable distortion, or focus on minimizing MSE for faithful restoration. When the goal shifts or noisy measurements vary, adapting to different points on the DP plane needs retraining or even re-designing the model. Inspired by recent advances in solving inverse problems using score-based generative models, we explore the potential of flexibly and optimally traversing DP tradeoffs using a single pre-trained score-based model. Specifically, we introduce a variance-scaled reverse diffusion process and theoretically characterize the marginal distribution. We then prove that the proposed sample process is an optimal solution to the DP tradeoff for conditional Gaussian distribution. Experimental results on two-dimensional and image datasets illustrate that a single score network can effectively and flexibly traverse the DP tradeoff for general denoising problems.
Abstract:This work considers an uplink asynchronous massive random access scenario in which a large number of users asynchronously access a base station equipped with multiple receive antennas. The objective is to alleviate the problem of massive collision due to the limited number of orthogonal preambles of an access scheme in which user activity detection is performed. We propose a user activity detection with delay-calibration (UAD-DC) algorithm and investigate the benefits of oversampling for the estimation of continuous time delays at the receiver. The proposed algorithm iteratively estimates time delays and detects active users by noting that the collided users can be identified through accurate estimation of time delays. Due to the sporadic user activity patterns, the user activity detection problem can be formulated as a compressive sensing (CS) problem, which can be solved by a modified Turbo-CS algorithm under the consideration of correlated noise samples resulting from oversampling. A sliding-window technique is applied in the proposed algorithm to reduce the overall computational complexity. Moreover, we propose a new design of the pulse shaping filter by minimizing the Bayesian Cram\'er-Rao bound of the detection problem under the constraint of limited spectral bandwidth. Numerical results demonstrate the efficacy of the proposed algorithm in terms of the normalized mean squared error of the estimated channel, the probability of misdetection and the successful detection ratio.
Abstract:This paper introduces Bifr\"ost, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Previous methods concentrate on image compositing at the 2D level, which fall short in handling complex spatial relationships ($\textit{e.g.}$, occlusion). Bifr\"ost addresses these issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process to bridge the gap between 2D and 3D, which enhances spatial comprehension and supports sophisticated spatial interactions. Our method begins by fine-tuning MLLM with a custom counterfactual dataset to predict 2.5D object locations in complex backgrounds from language instructions. Then, the image-compositing model is uniquely designed to process multiple types of input features, enabling it to perform high-fidelity image compositions that consider occlusion, depth blur, and image harmonization. Extensive qualitative and quantitative evaluations demonstrate that Bifr\"ost significantly outperforms existing methods, providing a robust solution for generating realistically composed images in scenarios demanding intricate spatial understanding. This work not only pushes the boundaries of generative image compositing but also reduces reliance on expensive annotated datasets by effectively utilizing existing resources in innovative ways.
Abstract:Existing diffusion-based methods for inverse problems sample from the posterior using score functions and accept the generated random samples as solutions. In applications that posterior mean is preferred, we have to generate multiple samples from the posterior which is time-consuming. In this work, by analyzing the probability density evolution of the conditional reverse diffusion process, we prove that the posterior mean can be achieved by tracking the mean of each reverse diffusion step. Based on that, we establish a framework termed reverse mean propagation (RMP) that targets the posterior mean directly. We show that RMP can be implemented by solving a variational inference problem, which can be further decomposed as minimizing a reverse KL divergence at each reverse step. We further develop an algorithm that optimizes the reverse KL divergence with natural gradient descent using score functions and propagates the mean at each reverse step. Experiments demonstrate the validity of the theory of our framework and show that our algorithm outperforms state-of-the-art algorithms on reconstruction performance with lower computational complexity in various inverse problems.
Abstract:This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formulate the E2E design of feature encoding, MIMO precoding, and classification as a conditional mutual information maximization problem. However, it is notoriously difficult to design and train an E2E network that can be adaptive to both the task dataset and different channel realizations. Regarding network training, we propose a decoupled pretraining framework that separately trains the feature encoder and the MIMO precoder, with a maximum a posteriori (MAP) classifier employed at the server to generate the inference result. The feature encoder is pretrained exclusively using the task dataset, while the MIMO precoder is pretrained solely based on the channel and noise distributions. Nevertheless, we manage to align the pretraining objectives of each individual component with the E2E learning objective, so as to approach the performance bound of E2E learning. By leveraging the decoupled pretraining results for initialization, the E2E learning can be conducted with minimal training overhead. Regarding network architecture design, we develop two deep unfolded precoding networks that effectively incorporate the domain knowledge of the solution to the decoupled precoding problem. Simulation results on both the CIFAR-10 and ModelNet10 datasets verify that the proposed method achieves significantly higher classification accuracy compared to various baselines.
Abstract:In this paper, we propose a learning-based block-wise planar channel estimator (LBPCE) with high accuracy and low complexity to estimate the time-varying frequency-selective channel of a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) system. First, we establish a block-wise planar channel model (BPCM) to characterize the correlation of the channel across subcarriers and OFDM symbols. Specifically, adjacent subcarriers and OFDM symbols are divided into several sub-blocks, and an affine function (i.e., a plane) with only three variables (namely, mean, time-domain slope, and frequency-domain slope) is used to approximate the channel in each sub-block, which significantly reduces the number of variables to be determined in channel estimation. Second, we design a 3D dilated residual convolutional network (3D-DRCN) that leverages the time-frequency-space-domain correlations of the channel to further improve the channel estimates of each user. Numerical results demonstrate that the proposed significantly outperforms the state-of-the-art estimators and maintains a relatively low computational complexity.
Abstract:To exploit unprecedented data generation in mobile edge networks, federated learning (FL) has emerged as a promising alternative to the conventional centralized machine learning (ML). However, there are some critical challenges for FL deployment. One major challenge called straggler issue severely limits FL's coverage where the device with the weakest channel condition becomes the bottleneck of the model aggregation performance. Besides, the huge uplink communication overhead compromises the effectiveness of FL, which is particularly pronounced in large-scale systems. To address the straggler issue, we propose the integration of an unmanned aerial vehicle (UAV) as the parameter server (UAV-PS) to coordinate the FL implementation. We further employ over-the-air computation technique that leverages the superposition property of wireless channels for efficient uplink communication. Specifically, in this paper, we develop a novel UAV-enabled over-the-air asynchronous FL (UAV-AFL) framework which supports the UAV-PS in updating the model continuously to enhance the learning performance. Moreover, we conduct a convergence analysis to quantitatively capture the impact of model asynchrony, device selection and communication errors on the UAV-AFL learning performance. Based on this, a unified communication-learning problem is formulated to maximize asymptotical learning performance by optimizing the UAV-PS trajectory, device selection and over-the-air transceiver design. Simulation results demonstrate that the proposed scheme achieves substantially learning efficiency improvement compared with the state-of-the-art approaches.