Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ping Zhang

Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models

Oct 26, 2024

Senran Fan, Zhicheng Bao, Chen Dong, Haotai Liang, Xiaodong Xu, Ping Zhang

Figure 1 for Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models

Figure 2 for Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models

Figure 3 for Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models

Figure 4 for Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models

Abstract:The end-to-end image communication system has been widely studied in the academic community. The escalating demands on image communication systems in terms of data volume, environmental complexity, and task precision require enhanced communication efficiency, anti-noise ability and semantic fidelity. Therefore, we proposed a novel paradigm based on Semantic Feature Decomposition (SeFD) for the integration of semantic communication and large-scale visual generation models to achieve high-performance, highly interpretable and controllable image communication. According to this paradigm, a Texture-Color based Semantic Communication system of Images TCSCI is proposed. TCSCI decomposing the images into their natural language description (text), texture and color semantic features at the transmitter. During the transmission, features are transmitted over the wireless channel, and at the receiver, a large-scale visual generation model is utilized to restore the image through received features. TCSCI can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process. The experiments demonstrate that the TCSCI outperforms traditional image communication systems and existing semantic communication systems under extreme compression with good anti-noise performance and interpretability.

* 13 pages, 13 figures

Via

Access Paper or Ask Questions

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Oct 21, 2024

Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang

Abstract:The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

Via

Access Paper or Ask Questions

Wireless Environment Information Sensing, Feature, Semantic, and Knowledge: Four Steps Towards 6G AI-Enabled Air Interface

Sep 28, 2024

Jianhua Zhang, Yichen Cai, Li Yu, Zhen Zhang, Yuxiang Zhang, Jialin Wang, Tao Jiang, Liang Xia, Ping Zhang

Figure 1 for Wireless Environment Information Sensing, Feature, Semantic, and Knowledge: Four Steps Towards 6G AI-Enabled Air Interface

Figure 2 for Wireless Environment Information Sensing, Feature, Semantic, and Knowledge: Four Steps Towards 6G AI-Enabled Air Interface

Figure 3 for Wireless Environment Information Sensing, Feature, Semantic, and Knowledge: Four Steps Towards 6G AI-Enabled Air Interface

Figure 4 for Wireless Environment Information Sensing, Feature, Semantic, and Knowledge: Four Steps Towards 6G AI-Enabled Air Interface

Abstract:The air interface technology plays a crucial role in optimizing the communication quality for users. To address the challenges brought by the radio channel variations to air interface design, this article proposes a framework of wireless environment information-aided 6G AI-enabled air interface (WEI-6G AI$^{2}$), which actively acquires real-time environment details to facilitate channel fading prediction and communication technology optimization. Specifically, we first outline the role of WEI in supporting the 6G AI$^{2}$ in scenario adaptability, real-time inference, and proactive action. Then, WEI is delineated into four progressive steps: raw sensing data, features obtained by data dimensionality reduction, semantics tailored to tasks, and knowledge that quantifies the environmental impact on the channel. To validate the availability and compare the effect of different types of WEI, a path loss prediction use case is designed. The results demonstrate that leveraging environment knowledge requires only 2.2 ms of model inference time, which can effectively support real-time design for future 6G AI$^{2}$. Additionally, WEI can reduce the pilot overhead by 25\%. Finally, several open issues are pointed out, including multi-modal sensing data synchronization and information extraction method construction.

Via

Access Paper or Ask Questions

Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs

Sep 26, 2024

Deniz Gündüz, Michèle A. Wigger, Tze-Yang Tung, Ping Zhang, Yong Xiao

Figure 1 for Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs

Figure 2 for Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs

Figure 3 for Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs

Figure 4 for Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs

Abstract:Semantic- and task-oriented communication has emerged as a promising approach to reducing the latency and bandwidth requirements of next-generation mobile networks by transmitting only the most relevant information needed to complete a specific task at the receiver. This is particularly advantageous for machine-oriented communication of high data rate content, such as images and videos, where the goal is rapid and accurate inference, rather than perfect signal reconstruction. While semantic- and task-oriented compression can be implemented in conventional communication systems, joint source-channel coding (JSCC) offers an alternative end-to-end approach by optimizing compression and channel coding together, or even directly mapping the source signal to the modulated waveform. Although all digital communication systems today rely on separation, thanks to its modularity, JSCC is known to achieve higher performance in finite blocklength scenarios, and to avoid cliff and the levelling-off effects in time-varying channel scenarios. This article provides an overview of the information theoretic foundations of JSCC, surveys practical JSCC designs over the decades, and discusses the reasons for their limited adoption in practical systems. We then examine the recent resurgence of JSCC, driven by the integration of deep learning techniques, particularly through DeepJSCC, highlighting its many surprising advantages in various scenarios. Finally, we discuss why it may be time to reconsider today's strictly separate architectures, and reintroduce JSCC to enable high-fidelity, low-latency communications in critical applications such as autonomous driving, drone surveillance, or wearable systems.

* Under review for possible publication

Via

Access Paper or Ask Questions

MambaJSCC: Adaptive Deep Joint Source-Channel Coding with Generalized State Space Model

Sep 25, 2024

Tong Wu, Zhiyong Chen, Meixia Tao, Yaping Sun, Xiaodong Xu, Wenjun Zhang, Ping Zhang

Abstract:Lightweight and efficient neural network models for deep joint source-channel coding (JSCC) are crucial for semantic communications. In this paper, we propose a novel JSCC architecture, named MambaJSCC, that achieves state-of-the-art performance with low computational and parameter overhead. MambaJSCC utilizes the visual state space model with channel adaptation (VSSM-CA) blocks as its backbone for transmitting images over wireless channels, where the VSSM-CA primarily consists of the generalized state space models (GSSM) and the zero-parameter, zero-computational channel adaptation method (CSI-ReST). We design the GSSM module, leveraging reversible matrix transformations to express generalized scan expanding operations, and theoretically prove that two GSSM modules can effectively capture global information. We discover that GSSM inherently possesses the ability to adapt to channels, a form of endogenous intelligence. Based on this, we design the CSI-ReST method, which injects channel state information (CSI) into the initial state of GSSM to utilize its native response, and into the residual state to mitigate CSI forgetting, enabling effective channel adaptation without introducing additional computational and parameter overhead. Experimental results show that MambaJSCC not only outperforms existing JSCC methods (e.g., SwinJSCC) across various scenarios but also significantly reduces parameter size, computational overhead, and inference delay.

* submitted to IEEE Journal

Via

Access Paper or Ask Questions

Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Sep 24, 2024

Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Li Zhang, Wei-Lun Chao

Figure 1 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Figure 2 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Figure 3 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Figure 4 for Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Abstract:Parameter-efficient transfer learning (PETL) has attracted significant attention lately, due to the increasing size of pre-trained models and the need to fine-tune (FT) them for superior downstream performance. This community-wide enthusiasm has sparked a plethora of new methods. Nevertheless, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like when to apply PETL and which method to use largely unanswered. In this paper, we conduct a unifying empirical study of representative PETL methods in the context of Vision Transformers. We systematically tune their hyper-parameters to fairly compare their accuracy on downstream tasks. Our study not only offers a valuable user guide but also unveils several new insights. First, if tuned carefully, different PETL methods can obtain quite similar accuracy in the low-shot benchmark VTAB-1K. This includes simple methods like FT the bias terms that were reported inferior. Second, though with similar accuracy, we find that PETL methods make different mistakes and high-confidence predictions, likely due to their different inductive biases. Such an inconsistency (or complementariness) opens up the opportunity for ensemble methods, and we make preliminary attempts at this. Third, going beyond the commonly used low-shot tasks, we find that PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT, using much fewer learnable parameters. Last but not least, we investigate PETL's ability to preserve a pre-trained model's robustness to distribution shifts (e.g., a CLIP backbone). Perhaps not surprisingly, PETL methods outperform full FT alone. However, with weight-space ensembles, the fully FT model can achieve a better balance between downstream and out-of-distribution performance, suggesting a future research direction for PETL.

* Code is available at https://github.com/OSU-MLB/PETL_Vision

Via

Access Paper or Ask Questions

Fine-Tuning is Fine, if Calibrated

Sep 24, 2024

Zheda Mai, Arpita Chowdhury, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su(+1 more)

Figure 1 for Fine-Tuning is Fine, if Calibrated

Figure 2 for Fine-Tuning is Fine, if Calibrated

Figure 3 for Fine-Tuning is Fine, if Calibrated

Figure 4 for Fine-Tuning is Fine, if Calibrated

Abstract:Fine-tuning is arguably the most straightforward way to tailor a pre-trained model (e.g., a foundation model) to downstream applications, but it also comes with the risk of losing valuable knowledge the model had learned in pre-training. For example, fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the model's accuracy in the other classes it had previously learned. As such, it is hard to further use the fine-tuned model when it encounters classes beyond the fine-tuning data. In this paper, we systematically dissect the issue, aiming to answer the fundamental question, ''What has been damaged in the fine-tuned model?'' To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes. Instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning! {What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes}, implying that a simple post-processing calibration would bring back the pre-trained model's capability and at the same time unveil the feature improvement over all classes. We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.

* The first three authors contribute equally

Via

Access Paper or Ask Questions

Clutter Suppression, Time-Frequency Synchronization, and Sensing Parameter Association in Asynchronous Perceptive Vehicular Networks

Sep 02, 2024

Xiao-Yang Wang, Shaoshi Yang, Jianhua Zhang, Christos Masouros, Ping Zhang

Figure 1 for Clutter Suppression, Time-Frequency Synchronization, and Sensing Parameter Association in Asynchronous Perceptive Vehicular Networks

Figure 2 for Clutter Suppression, Time-Frequency Synchronization, and Sensing Parameter Association in Asynchronous Perceptive Vehicular Networks

Figure 3 for Clutter Suppression, Time-Frequency Synchronization, and Sensing Parameter Association in Asynchronous Perceptive Vehicular Networks

Figure 4 for Clutter Suppression, Time-Frequency Synchronization, and Sensing Parameter Association in Asynchronous Perceptive Vehicular Networks

Abstract:Significant challenges remain for realizing precise positioning and velocity estimation in perceptive vehicular networks (PVN) enabled by the emerging integrated sensing and communication technology. First, complicated wireless propagation environment generates undesired clutter, which degrades the vehicular sensing performance and increases the computational complexity. Second, in practical PVN, multiple types of parameters individually estimated are not well associated with specific vehicles, which may cause error propagation in multiple-vehicle positioning. Third, radio transceivers in a PVN are naturally asynchronous, which causes strong range and velocity ambiguity. To overcome these challenges, 1) we introduce a moving target indication based joint clutter suppression and sensing algorithm, and analyze its clutter-suppression performance and the Cramer-Rao lower bound of the paired range-velocity estimation upon using the proposed clutter suppression algorithm; 2) we design algorithms for associating individual direction-of-arrival estimates with the paired range-velocity estimates based on "domain transformation"; 3) we propose the first viable carrier frequency offset (CFO) and time offset (TO) estimation algorithm that supports passive vehicular sensing in non-line-of-sight environments. This algorithm treats the delay-Doppler spectrum of the signals reflected by static objects as an environment-specific "fingerprint spectrum", which is shown to exhibit a circular shift property upon changing the CFO and/or TO. Then, the CFO and TO are efficiently estimated by acquiring the number of circular shifts, and we also analyse the mean squared error performance of the proposed time-frequency synchronization algorithm. Simulation results demonstrate the performance advantages of our algorithms under diverse configurations, while corroborating the theoretical analysis.

* 18 pages, 13 figures, 3 tables, accepted to publish on IEEE Journal on Selected Areas in Communications, vol. 42, no. 10, Oct. 2024

Via

Access Paper or Ask Questions

Stochastic Geometry Based Modelling and Analysis of Uplink Cooperative Satellite-Aerial-Terrestrial Networks for Nomadic Communications with Weak Satellite Coverage

Aug 27, 2024

Wen-Yu Dong, Shaoshi Yang, Ping Zhang, Sheng Chen

Abstract:Cooperative satellite-aerial-terrestrial networks (CSATNs), where unmanned aerial vehicles (UAVs) are utilized as nomadic aerial relays (A), are highly valuable for many important applications, such as post-disaster urban reconstruction. In this scenario, direct communication between terrestrial terminals (T) and satellites (S) is often unavailable due to poor propagation conditions for satellite signals, and users tend to congregate in regions of finite size. There is a current dearth in the open literature regarding the uplink performance analysis of CSATN operating under the above constraints, and the few contributions on the uplink model terrestrial terminals by a Poisson point process (PPP) relying on the unrealistic assumption of an infinite area. This paper aims to fill the above research gap. First, we propose a stochastic geometry based innovative model to characterize the impact of the finite-size distribution region of terrestrial terminals in the CSATN by jointly using a binomial point process (BPP) and a type-II Mat{\'e}rn hard-core point process (MHCPP). Then, we analyze the relationship between the spatial distribution of the coverage areas of aerial nodes and the finite-size distribution region of terrestrial terminals, thereby deriving the distance distribution of the T-A links. Furthermore, we consider the stochastic nature of the spatial distributions of terrestrial terminals and UAVs, and conduct a thorough analysis of the coverage probability and average ergodic rate of the T-A links under Nakagami fading and the A-S links under shadowed-Rician fading. Finally, the accuracy of our theoretical derivations are confirmed by Monte Carlo simulations. Our research offers fundamental insights into the system-level performance optimization for the realistic CSATNs involving nomadic aerial relays and terrestrial terminals confined in a finite-size region.

* 17 pages, 16 pages, 2 tables, accepted to appear on IEEE Journal on Selected Areas in Communications, Aug. 2024

Via

Access Paper or Ask Questions

Rate-Distortion-Perception Controllable Joint Source-Channel Coding for High-Fidelity Generative Communications

Aug 26, 2024

Kailin Tan, Jincheng Dai, Zhenyu Liu, Sixian Wang, Xiaoqi Qin, Wenjun Xu, Kai Niu, Ping Zhang

Abstract:End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.

Via

Access Paper or Ask Questions