Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoming Chen

Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University

Semantics-Guided Diffusion for Deep Joint Source-Channel Coding in Wireless Image Transmission

Jan 02, 2025

Maojun Zhang, Haotian Wu, Guangxu Zhu, Richeng Jin, Xiaoming Chen, Deniz Gündüz

Abstract:Joint source-channel coding (JSCC) offers a promising avenue for enhancing transmission efficiency by jointly incorporating source and channel statistics into the system design. A key advancement in this area is the deep joint source and channel coding (DeepJSCC) technique that designs a direct mapping of input signals to channel symbols parameterized by a neural network, which can be trained for arbitrary channel models and semantic quality metrics. This paper advances the DeepJSCC framework toward a semantics-aligned, high-fidelity transmission approach, called semantics-guided diffusion DeepJSCC (SGD-JSCC). Existing schemes that integrate diffusion models (DMs) with JSCC face challenges in transforming random generation into accurate reconstruction and adapting to varying channel conditions. SGD-JSCC incorporates two key innovations: (1) utilizing some inherent information that contributes to the semantics of an image, such as text description or edge map, to guide the diffusion denoising process; and (2) enabling seamless adaptability to varying channel conditions with the help of a semantics-guided DM for channel denoising. The DM is guided by diverse semantic information and integrates seamlessly with DeepJSCC. In a slow fading channel, SGD-JSCC dynamically adapts to the instantaneous signal-to-noise ratio (SNR) directly estimated from the channel output, thereby eliminating the need for additional pilot transmissions for channel estimation. In a fast fading channel, we introduce a training-free denoising strategy, allowing SGD-JSCC to effectively adjust to fluctuations in channel gains. Numerical results demonstrate that, guided by semantic information and leveraging the powerful DM, our method outperforms existing DeepJSCC schemes, delivering satisfactory reconstruction performance even at extremely poor channel conditions.

* 13 pages, submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

NeRF-NQA: No-Reference Quality Assessment for Scenes Generated by NeRF and Neural View Synthesis Methods

Dec 11, 2024

Qiang Qu, Hanxue Liang, Xiaoming Chen, Yuk Ying Chung, Yiran Shen

Abstract:Neural View Synthesis (NVS) has demonstrated efficacy in generating high-fidelity dense viewpoint videos using a image set with sparse views. However, existing quality assessment methods like PSNR, SSIM, and LPIPS are not tailored for the scenes with dense viewpoints synthesized by NVS and NeRF variants, thus, they often fall short in capturing the perceptual quality, including spatial and angular aspects of NVS-synthesized scenes. Furthermore, the lack of dense ground truth views makes the full reference quality assessment on NVS-synthesized scenes challenging. For instance, datasets such as LLFF provide only sparse images, insufficient for complete full-reference assessments. To address the issues above, we propose NeRF-NQA, the first no-reference quality assessment method for densely-observed scenes synthesized from the NVS and NeRF variants. NeRF-NQA employs a joint quality assessment strategy, integrating both viewwise and pointwise approaches, to evaluate the quality of NVS-generated scenes. The viewwise approach assesses the spatial quality of each individual synthesized view and the overall inter-views consistency, while the pointwise approach focuses on the angular qualities of scene surface points and their compound inter-point quality. Extensive evaluations are conducted to compare NeRF-NQA with 23 mainstream visual quality assessment methods (from fields of image, video, and light-field assessment). The results demonstrate NeRF-NQA outperforms the existing assessment methods significantly and it shows substantial superiority on assessing NVS-synthesized scenes without references. An implementation of this paper are available at https://github.com/VincentQQu/NeRF-NQA.

* IEEE Transactions on Visualization and Computer Graphics, vol. 30, no. 5, pp. 2129-2139, May 2024

Via

Access Paper or Ask Questions

EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

Dec 10, 2024

Qiang Qu, Xiaoming Chen, Yuk Ying Chung, Yiran Shen

Figure 1 for EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

Figure 2 for EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

Figure 3 for EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

Figure 4 for EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

Abstract:Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the noisy nature of event-streams. In this paper, we introduce a data-driven approach aiming at enhancing the quality of event-stream representations. Our approach commences with the introduction of a new event-stream representation based on spatial-temporal statistics, denoted as EvRep. Subsequently, we theoretically derive the intrinsic relationship between asynchronous event-streams and synchronous video frames. Building upon this theoretical relationship, we train a representation generator, RepGen, in a self-supervised learning manner accepting EvRep as input. Finally, the event-streams are converted to high-quality representations, termed as EvRepSL, by going through the learned RepGen (without the need of fine-tuning or retraining). Our methodology is rigorously validated through extensive evaluations on a variety of mainstream event-based classification and optical flow datasets (captured with various types of event cameras). The experimental results highlight not only our approach's superior performance over existing event-stream representations but also its versatility, being agnostic to different event cameras and tasks.

* IEEE Transactions on Image Processing, vol. 33, pp. 6579-6591, 2024
* Published on IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions

Dec 10, 2024

Qiang Qu, Xiaoming Chen, Vera Chung, Zhibo Chen

Figure 1 for Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions

Figure 2 for Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions

Figure 3 for Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions

Figure 4 for Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions

Abstract:In multimedia broadcasting, no-reference image quality assessment (NR-IQA) is used to indicate the user-perceived quality of experience (QoE) and to support intelligent data transmission while optimizing user experience. This paper proposes an improved no-reference light field image quality assessment (NR-LFIQA) metric for future immersive media broadcasting services. First, we extend the concept of depthwise separable convolution (DSC) to the spatial domain of light field image (LFI) and introduce "light field depthwise separable convolution (LF-DSC)", which can extract the LFI's spatial features efficiently. Second, we further theoretically extend the LF-DSC to the angular space of LFI and introduce the novel concept of "light field anglewise separable convolution (LF-ASC)", which is capable of extracting both the spatial and angular features for comprehensive quality assessment with low complexity. Third, we define the spatial and angular feature estimations as auxiliary tasks in aiding the primary NR-LFIQA task by providing spatial and angular quality features as hints. To the best of our knowledge, this work is the first exploration of deep auxiliary learning with spatial-angular hints on NR-LFIQA. Experiments were conducted in mainstream LFI datasets such as Win5-LID and SMART with comparisons to the mainstream full reference IQA metrics as well as the state-of-the-art NR-LFIQA methods. The experimental results show that the proposed metric yields overall 42.86% and 45.95% smaller prediction errors than the second-best benchmarking metric in Win5-LID and SMART, respectively. In some challenging cases with particular distortion types, the proposed metric can reduce the errors significantly by more than 60%.

* IEEE Transactions on Broadcasting, vol. 67, no. 4, pp. 837-850, Dec. 2021

Via

Access Paper or Ask Questions

Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels

Nov 20, 2024

Haodong Chen, Runnan Chen, Qiang Qu, Zhaoqing Wang, Tongliang Liu, Xiaoming Chen, Yuk Ying Chung

Figure 1 for Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels

Figure 2 for Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels

Figure 3 for Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels

Figure 4 for Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels

Abstract:Recent advancements in 3D Gaussian Splatting (3DGS) have substantially improved novel view synthesis, enabling high-quality reconstruction and real-time rendering. However, blurring artifacts, such as floating primitives and over-reconstruction, remain challenging. Current methods address these issues by refining scene structure, enhancing geometric representations, addressing blur in training images, improving rendering consistency, and optimizing density control, yet the role of kernel design remains underexplored. We identify the soft boundaries of Gaussian ellipsoids as one of the causes of these artifacts, limiting detail capture in high-frequency regions. To bridge this gap, we introduce 3D Linear Splatting (3DLS), which replaces Gaussian kernels with linear kernels to achieve sharper and more precise results, particularly in high-frequency regions. Through evaluations on three datasets, 3DLS demonstrates state-of-the-art fidelity and accuracy, along with a 30% FPS improvement over baseline 3DGS. The implementation will be made publicly available upon acceptance.

Via

Access Paper or Ask Questions

Adaptively Augmented Consistency Learning: A Semi-supervised Segmentation Framework for Remote Sensing

Nov 14, 2024

Hui Ye, Haodong Chen, Xiaoming Chen, Vera Chung

Figure 1 for Adaptively Augmented Consistency Learning: A Semi-supervised Segmentation Framework for Remote Sensing

Figure 2 for Adaptively Augmented Consistency Learning: A Semi-supervised Segmentation Framework for Remote Sensing

Figure 3 for Adaptively Augmented Consistency Learning: A Semi-supervised Segmentation Framework for Remote Sensing

Figure 4 for Adaptively Augmented Consistency Learning: A Semi-supervised Segmentation Framework for Remote Sensing

Abstract:Remote sensing (RS) involves the acquisition of data about objects or areas from a distance, primarily to monitor environmental changes, manage resources, and support planning and disaster response. A significant challenge in RS segmentation is the scarcity of high-quality labeled images due to the diversity and complexity of RS image, which makes pixel-level annotation difficult and hinders the development of effective supervised segmentation algorithms. To solve this problem, we propose Adaptively Augmented Consistency Learning (AACL), a semi-supervised segmentation framework designed to enhances RS segmentation accuracy under condictions of limited labeled data. AACL extracts additional information embedded in unlabeled images through the use of Uniform Strength Augmentation (USAug) and Adaptive Cut-Mix (AdaCM). Evaluations across various RS datasets demonstrate that AACL achieves competitive performance in semi-supervised segmentation, showing up to a 20% improvement in specific categories and 2% increase in overall performance compared to state-of-the-art frameworks.

* International Conference on Neural Information Processing 2024

Via

Access Paper or Ask Questions

Electromagnetic Modeling and Capacity Analysis of Rydberg Atom-Based MIMO System

Nov 13, 2024

Shuai S. A. Yuan, Xinyi Y. I. Xu, Jinpeng Yuan, Guoda Xie, Chongwen Huang, Xiaoming Chen, Zhixiang Huang, Wei E. I. Sha

Abstract:Rydberg atom-based antennas exploit the quantum properties of highly excited Rydberg atoms, providing unique advantages over classical antennas, such as high sensitivity, broad frequency range, and compact size. Despite the increasing interests in their applications in antenna and communication engineering, two key properties, involving the lack of polarization multiplexing and isotropic reception without mutual coupling, remain unexplored in the analysis of Rydberg atom-based spatial multiplexing, i.e., multiple-input and multiple-output (MIMO), communications. Generally, the design considerations for any antenna, even for atomic ones, can be extracted to factors such as radiation patterns, efficiency, and polarization, allowing them to be seamlessly integrated into existing system models. In this letter, we extract the antenna properties from relevant quantum characteristics, enabling electromagnetic modeling and capacity analysis of Rydberg MIMO systems in both far-field and near-field scenarios. By employing ray-based method for far-field analysis and dyadic Green's function for near-field calculation, our results indicate that Rydberg atom-based antenna arrays offer specific advantages over classical dipole-type arrays in single-polarization MIMO communications.

Via

Access Paper or Ask Questions

Exploiting On-Orbit Characteristics for Joint Parameter and Channel Tracking in LEO Satellite Communications

Oct 29, 2024

Chenlan Lin, Xiaoming Chen, Zhaoyang Zhang

Abstract:In high-dynamic low earth orbit (LEO) satellite communication (SATCOM) systems, frequent channel state information (CSI) acquisition consumes a large number of pilots, which is intolerable in resource-limited SATCOM systems. To tackle this problem, we propose to track the state-dependent parameters including Doppler shift and channel angles, by exploiting the physical and approximate on-orbit mobility characteristics for LEO satellite and ground users (GUs), respectively. As a prerequisite for tracking, we formulate the state evolution models for kinematic (state) parameters of both satellite and GUs, along with the measurement models that describe the relationship between the state-dependent parameters and states. Then the rough estimation of state-dependent parameters is initially conducted, which is used as the measurement results in the subsequent state tracking. Concurrently, the measurement error covariance is predicted based on the formulated Cram$\acute{\text{e}}$r-Rao lower bound (CRLB). Finally, with the extended Kalman filter (EKF)-based state tracking as the bridge, the Doppler shift and channel angles can be further updated and the CSI can also be acquired. Simulation results show that compared to the rough estimation methods, the proposed joint parameter and channel tracking (JPCT) algorithm performs much better in the estimation of state-dependent parameters. Moreover, as to the CSI acquisition, the proposed algorithm can utilize a shorter pilot sequence than benchmark methods under a given estimation accuracy.

* IEEE Transactions on Wireless Communications, 2024

Via

Access Paper or Ask Questions

BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Sep 25, 2024

Yongqi Xu, Yujian Lee, Gao Yi, Bosheng Liu, Yucong Chen, Peng Liu, Jigang Wu, Xiaoming Chen, Yinhe Han

Figure 1 for BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Figure 2 for BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Figure 3 for BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Figure 4 for BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Abstract:Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification, object detection, and scene segmentation. One drawback however is the significant high computational complexity and memory consumption, which makes them unfeasible to run real-time on embedded platforms because of the limited hardware resources. Block floating point (BFP) quantization is one of the representative compression approaches for reducing the memory and computational burden owing to their capability to effectively capture the broad data distribution of DNN models. Unfortunately, prior works on BFP-based quantization empirically choose the block size and the precision that preserve accuracy. In this paper, we develop a BFP-based bitwidth-aware analytical modeling framework (called ``BitQ'') for the best BFP implementation of DNN inference on embedded platforms. We formulate and resolve an optimization problem to identify the optimal BFP block size and bitwidth distribution by the trade-off of both accuracy and performance loss. Experimental results show that compared with an equal bitwidth setting, the BFP DNNs with optimized bitwidth allocation provide efficient computation, preserving accuracy on famous benchmarks. The source code and data are available at https://github.com/Cheliosoops/BitQ.

Via

Access Paper or Ask Questions

Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition

Sep 15, 2024

Zongyou Yu, Qiang Qu, Xiaoming Chen, Chen Wang

Abstract:Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. Particularly, we evaluate the ability of GPT-4o / 4turbo and two other open-source LLMs to directly recognize event-based visual content. Extensive experiments are conducted across three benchmark datasets, systematically assessing the recognition accuracy of these models. The results show that LLMs, especially when enhanced with well-designed prompts, significantly improve event-based zero-shot recognition performance. Notably, GPT-4o outperforms the compared models and exceeds the recognition accuracy of state-of-the-art event-based zero-shot methods on N-ImageNet by five orders of magnitude. The implementation of this paper is available at \url{https://github.com/ChrisYu-Zz/Pure-event-based-recognition-based-LLM}.

Via

Access Paper or Ask Questions