University of California San Diego, USA
Abstract:Data plays a crucial role in training learning-based methods for 3D point cloud registration. However, the real-world dataset is expensive to build, while rendering-based synthetic data suffers from domain gaps. In this work, we present PointRegGPT, boosting 3D point cloud registration using generative point-cloud pairs for training. Given a single depth map, we first apply a random camera motion to re-project it into a target depth map. Converting them to point clouds gives a training pair. To enhance the data realism, we formulate a generative model as a depth inpainting diffusion to process the target depth map with the re-projected source depth map as the condition. Also, we design a depth correction module to alleviate artifacts caused by point penetration during the re-projection. To our knowledge, this is the first generative approach that explores realistic data generation for indoor point cloud registration. When equipped with our approach, several recent algorithms can improve their performance significantly and achieve SOTA consistently on two common benchmarks. The code and dataset will be released on https://github.com/Chen-Suyi/PointRegGPT.
Abstract:In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, materials, locations and their compositions in 3D scenes. We show that it outperforms competitive baselines and achieves significant improvements over state-of-the-art methods. In particular, Diff2Scene improves the state-of-the-art method on ScanNet200 by 12%.
Abstract:Image-based inspection systems have been widely deployed in manufacturing production lines. Due to the scarcity of defective samples, unsupervised anomaly detection that only leverages normal samples during training to detect various defects is popular. Existing feature-based methods, utilizing deep features from pretrained neural networks, show their impressive performance in anomaly localization and the low demand for the sample size for training. However, the detected anomalous regions of these methods always exhibit inaccurate boundaries, which impedes the downstream tasks. This deficiency is caused: (i) The decreased resolution of high-level features compared with the original image, and (ii) The mixture of adjacent normal and anomalous pixels during feature extraction. To address them, we propose a novel unified optimization framework (F2PAD) that leverages the Feature-level information to guide the optimization process for Pixel-level Anomaly Detection in the inference stage. The proposed framework is universal and plug-and-play, which can enhance various feature-based methods with limited assumptions. Case studies are provided to demonstrate the effectiveness of our strategy, particularly when applied to three popular backbone methods: PaDiM, CFLOW-AD, and PatchCore.
Abstract:In this letter, we study the energy efficiency maximization problem for a fluid antenna system (FAS) in near field communications. Specifically, we consider a point-to-point near-field system where the base station (BS) transmitter has multiple fixed-position antennas and the user receives the signals with multiple fluid antennas. Our objective is to jointly optimize the transmit beamforming of the BS and the fluid antenna positions at the user for maximizing the energy efficiency. Our scheme is based on an alternating optimization algorithm that iteratively solves the beamforming and antenna position subproblems. Our simulation results validate the performance improvement of the proposed algorithm and confirm the effectiveness of FAS.
Abstract:The advent of the sixth-generation (6G) networks presents another round of revolution for the mobile communication landscape, promising an immersive experience, robust reliability, minimal latency, extreme connectivity, ubiquitous coverage, and capabilities beyond communication, including intelligence and sensing. To achieve these ambitious goals, it is apparent that 6G networks need to incorporate the state-of-the-art technologies. One of the technologies that has garnered rising interest is fluid antenna system (FAS) which represents any software-controllable fluidic, conductive, or dielectric structure capable of dynamically changing its shape and position to reconfigure essential radio-frequency (RF) characteristics. Compared to traditional antenna systems (TASs) with fixed-position radiating elements, the core idea of FAS revolves around the unique flexibility of reconfiguring the radiating elements within a given space. One recent driver of FAS is the recognition of its position-flexibility as a new degree of freedom (dof) to harness diversity and multiplexing gains. In this paper, we provide a comprehensive tutorial, covering channel modeling, signal processing and estimation methods, information-theoretic insights, new multiple access techniques, and hardware designs. Moreover, we delineate the challenges of FAS and explore the potential of using FAS to improve the performance of other contemporary technologies. By providing insights and guidance, this tutorial paper serves to inspire researchers to explore new horizons and fully unleash the potential of FAS.
Abstract:This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interferes not only with Eve but also with the legitimate receiver, in this letter, we consider that encoded codewords are transmitted to jam Eve. Then, by employing appropriate coding schemes, the legitimate receiver can successfully decode the jamming signal and then cancel the interference, while Eve cannot, even if it knows the codebooks. We aim to maximize the secrecy rate through port selection and power control. Although the problem is non-convex, we show that the optimal solution can be found. Simulation results show that by using the FAS technique and the proposed jamming scheme, the secrecy rate of the system can be significantly increased.
Abstract:Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with a relatively small number of buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represented by their contexts. Building on this realistic and generalized contextual market model, we introduce MarketFCNet, a deep learning-based method for approximating market equilibrium. We start by parameterizing the allocation of each good to each buyer using a neural network, which depends solely on the context of the buyer and the good. Next, we propose an efficient method to estimate the loss function of the training algorithm unbiasedly, enabling us to optimize the network parameters through gradient descent. To evaluate the approximated solution, we introduce a metric called Nash Gap, which quantifies the deviation of the given allocation and price pair from the market equilibrium. Experimental results indicate that MarketFCNet delivers competitive performance and significantly lower running times compared to existing methods as the market scale expands, demonstrating the potential of deep learning-based methods to accelerate the approximation of large-scale contextual market equilibrium.
Abstract:Automated segmentation of Cardiac Magnetic Resonance (CMR) plays a pivotal role in efficiently assessing cardiac function, offering rapid clinical evaluations that benefit both healthcare practitioners and patients. While recent research has primarily focused on delineating structures in the short-axis orientation, less attention has been given to long-axis representations, mainly due to the complex nature of structures in this orientation. Performing pixel-wise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady-state free precession (SSFP) cine sequences is a crucial preprocessing stage for various analyses. However, the challenge lies in the significant variability in contrast, appearance, orientation, and positioning of the heart across different patients, clinical views, scanners, and imaging protocols. Consequently, achieving fully automatic semantic segmentation in this context is notoriously challenging. In recent years, several deep learning models have been proposed to accurately quantify and diagnose cardiac pathologies. These automated tools heavily rely on the accurate segmentation of cardiac structures in magnetic resonance images (MRI). Hence, there is a need for new methods to handle such structures' geometrical and textural complexities. We proposed 2D and 3D two-stage self-supervised deep learning segmentation hybrid transformer and CNN-based architectures for 4CH whole heart segmentation. Accurate segmentation of the ventricles and atria in 4CH views is crucial for analyzing heart health and reconstructing four-chamber meshes, which are essential for estimating various parameters to assess overall heart condition. Our proposed method outperformed state-of-the-art techniques, demonstrating superior performance in this domain.
Abstract:Fluid antenna system (FAS) has recently surfaced as a promising technology for the upcoming sixth generation (6G) wireless networks. Unlike traditional antenna system (TAS) with fixed antenna location, FAS introduces a flexible component where the radiating element can switch its position within a predefined space. This capability allows FAS to achieve additional diversity and multiplexing gains. Nevertheless, to fully reap the benefits of FAS, obtaining channel state information (CSI) over the predefined space is crucial. In this paper, we explore the interaction between a transmitter equipped with a traditional antenna and a receiver with a fluid antenna over an electromagnetic-compliant channel model. We address the challenges of channel estimation and reconstruction using Nyquist sampling and maximum likelihood estimation (MLE) methods. Our analysis reveals a fundamental tradeoff between the accuracy of the reconstructed channel and the number of estimated channels, indicating that half-wavelength sampling is insufficient for perfect reconstruction and that oversampling is essential to enhance accuracy. Despite its advantages, oversampling can introduce practical challenges. Consequently, we propose a suboptimal sampling distance that facilitates efficient channel reconstruction. In addition, we employ the MLE method to bound the channel estimation error by $\epsilon$, with a specific confidence interval (CI). Our findings enable us to determine the minimum number of estimated channels and the total number of pilot symbols required for efficient channel reconstruction in a given space. Lastly, we investigate the rate performance of FAS and TAS and demonstrate that FAS with imperfect CSI can outperform TAS with perfect CSI.
Abstract:As an emerging antenna technology, a fluid antenna system (FAS) enhances spatial diversity to improve both sensing and communication performance by shifting the active antennas among available ports. In this letter, we study the potential of shifting the integrated sensing and communication (ISAC) trade-off with FAS. We propose the model for FAS-enabled ISAC and jointly optimize the transmit beamforming and port selection of FAS. In particular, we aim to minimize the transmit power, while satisfying both communication and sensing requirements. An efficient iterative algorithm based on sparse optimization, convex approximation, and a penalty approach is developed. The simulation results show that the proposed scheme can attain 33% reductions in transmit power with guaranteed sensing and communication performance, showing the great potential of the fluid antenna for striking a flexible tradeoff between sensing and communication in ISAC systems.