Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deniz Gündüz

LotteryCodec: Searching the Implicit Representation in a Random Network for Low-Complexity Image Compression

Jul 01, 2025

Haotian Wu, Gongpu Chen, Pier Luigi Dragotti, Deniz Gündüz

Abstract:We introduce and validate the lottery codec hypothesis, which states that untrained subnetworks within randomly initialized networks can serve as synthesis networks for overfitted image compression, achieving rate-distortion (RD) performance comparable to trained networks. This hypothesis leads to a new paradigm for image compression by encoding image statistics into the network substructure. Building on this hypothesis, we propose LotteryCodec, which overfits a binary mask to an individual image, leveraging an over-parameterized and randomly initialized network shared by the encoder and the decoder. To address over-parameterization challenges and streamline subnetwork search, we develop a rewind modulation mechanism that improves the RD performance. LotteryCodec outperforms VTM and sets a new state-of-the-art in single-image compression. LotteryCodec also enables adaptive decoding complexity through adjustable mask ratios, offering flexible compression solutions for diverse device constraints and application requirements.

* International Conference on Machine Learning (2025)

Via

Access Paper or Ask Questions

ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications

May 16, 2025

Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Robert Schober, Deniz Gündüz

Abstract:Token communications (TokCom) is an emerging generative semantic communication concept that reduces transmission rates by using context and multimodal large language model (MLLM)-based token processing, with tokens serving as universal semantic units across modalities. In this paper, we propose a semantic multiple access scheme in the token domain, referred to as token domain multiple access (ToDMA), where a large number of devices share a token codebook and a modulation codebook for source and channel coding, respectively. Specifically, each transmitter first tokenizes its source signal and modulate each token to a codeword. At the receiver, compressed sensing is employed first to detect active tokens and the corresponding channel state information (CSI) from the superposed signals. Then, the source token sequences are reconstructed by clustering the token-associated CSI across multiple time slots. In case of token collisions, some active tokens cannot be assigned and some positions in the reconstructed token sequences are empty. We propose to use pre-trained MLLMs to leverage the context, predict masked tokens, and thus mitigate token collisions. Simulation results demonstrate the effectiveness of the proposed ToDMA framework for both text and image transmission tasks, achieving significantly lower latency compared to context-unaware orthogonal communication schemes, while also delivering superior distortion and perceptual quality compared to state-of-the-art context-unaware non-orthogonal communication methods.

* arXiv admin note: text overlap with arXiv:2502.06118

Via

Access Paper or Ask Questions

Realizing Fully-Connected Layers Over the Air via Reconfigurable Intelligent Surfaces

May 02, 2025

Meng Hua, Chenghong Bian, Haotian Wu, Deniz Gündüz

Abstract:By leveraging the waveform superposition property of the multiple access channel, over-the-air computation (AirComp) enables the execution of digital computations through analog means in the wireless domain, leading to faster processing and reduced latency. In this paper, we propose a novel approach to implement a neural network (NN) consisting of digital fully connected (FC) layers using physically reconfigurable hardware. Specifically, we investigate reconfigurable intelligent surfaces (RISs)-assisted multiple-input multiple-output (MIMO) systems to emulate the functionality of a NN for over-the-air inference. In this setup, both the RIS and the transceiver are jointly configured to manipulate the ambient wireless propagation environment, effectively reproducing the adjustable weights of a digital FC layer. We refer to this new computational paradigm as \textit{AirFC}. We formulate an imitation error minimization problem between the effective channel created by RIS and a target FC layer by jointly optimizing over-the-air parameters. To solve this non-convex optimization problem, an extremely low-complexity alternating optimization algorithm is proposed, where semi-closed-form/closed-form solutions for all optimization variables are derived. Simulation results show that the RIS-assisted MIMO-based AirFC can achieve competitive classification accuracy. Furthermore, it is also shown that a multi-RIS configuration significantly outperforms a single-RIS setup, particularly in line-of-sight (LoS)-dominated channels.

Via

Access Paper or Ask Questions

Speculative Sampling via Exponential Races

Apr 21, 2025

Szymon Kobus, Deniz Gündüz

Abstract:Speculative decoding accelerates large language model inference using a smaller draft model. In this paper, we establish a surprising connection between speculative decoding and channel simulation, which aims at simulating a noisy channel using as few bits as possible. This connection allows us to provide an information-theoretic analysis of the speed up that can be achieved by speculative decoding. Leveraging this link, we derive an explicit relation between generation speed-up and the number of tokens $k$ generated by the draft model for large $k$, which serves as an upper bound for all $k$. We also propose a novel speculative decoding method via exponential race ERSD that matches state-of-the-art performance.

Via

Access Paper or Ask Questions

A Fully Asynchronous Unsourced Random Access Scheme

Apr 15, 2025

Mert Ozates, Mohammad Kazemi, Gianluigi Liva, Deniz Gündüz

Abstract:We investigate fully asynchronous unsourced random access (URA), and propose a high-performing scheme that employs on-off division multiple access (ODMA). In this scheme, active users distribute their data over the transmit block based on a sparse transmission pattern without any limitations on the starting time. At the receiver side, we adopt a double sliding-window decoding approach, utilizing a smaller inner decoding window of two block lengths within a larger outer window to enhance the interference cancellation process. Within the inner window, the receiver iteratively applies preamble-free joint starting time and pattern detection, single-user decoding, and successive interference cancellation operations. A notable feature of the proposed scheme is its elimination of the need for a preamble for starting time detection; this is achieved using ODMA transmission patterns. Numerical results demonstrate that the proposed asynchronous URA scheme outperforms existing alternatives.

Via

Access Paper or Ask Questions

SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models

Mar 16, 2025

Jiakang Chen, Selim F. Yilmaz, Di You, Pier Luigi Dragotti, Deniz Gündüz

Abstract:Joint source-channel coding systems based on deep neural networks (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods primarily focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. This can lead to severe perceptual degradation when transmitting images under extreme conditions, such as low bandwidth compression ratios (BCRs) and low signal-to-noise ratios (SNRs). In this work, we propose SING, a novel two-stage JSCC framework that formulates the recovery of high-quality source images from corrupted reconstructions as an inverse problem. Depending on the availability of information about the DeepJSCC encoder/decoder and the channel at the receiver, SING can either approximate the stochastic degradation as a linear transformation, or leverage invertible neural networks (INNs) for precise modeling. Both approaches enable the seamless integration of diffusion models into the reconstruction process, enhancing perceptual quality. Experimental results demonstrate that SING outperforms DeepJSCC and other approaches, delivering superior perceptual quality even under extremely challenging conditions, including scenarios with significant distribution mismatches between the training and test data.

Via

Access Paper or Ask Questions

Task-Agnostic Semantic Communication with Multimodal Foundation Models

Feb 25, 2025

Jiangjing Hu, Haotian Wu, Wenjing Zhang, Fengyu Wang, Wenjun Xu, Hui Gao, Deniz Gündüz

Abstract:Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a task-agnostic SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient semantic communications under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP tokens encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a $41\%$ improvement in zero-shot accuracy at a low signal-to-noise ratio. Meanwhile, SemCLIP reduces bandwidth usage by more than $50$-fold compared to different image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.

Via

Access Paper or Ask Questions

Token-Domain Multiple Access: Exploiting Semantic Orthogonality for Collision Mitigation

Feb 10, 2025

Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Deniz Gündüz

Abstract:Token communications is an emerging generative semantic communication concept that reduces transmission rates by using context and transformer-based token processing, with tokens serving as universal semantic units. In this paper, we propose a semantic multiple access scheme in the token domain, referred to as ToDMA, where a large number of devices share a tokenizer and a modulation codebook for source and channel coding, respectively. Specifically, the source signal is tokenized into sequences, with each token modulated into a codeword. Codewords from multiple devices are transmitted simultaneously, resulting in overlap at the receiver. The receiver detects the transmitted tokens, assigns them to their respective sources, and mitigates token collisions by leveraging context and semantic orthogonality across the devices' messages. Simulations demonstrate that the proposed ToDMA framework outperforms context-unaware orthogonal and non-orthogonal communication methods in image transmission tasks, achieving lower latency and better image quality.

Via

Access Paper or Ask Questions

Dynamic Model Fine-Tuning For Extreme MIMO CSI Compression

Jan 30, 2025

Mehdi Sattari, Deniz Gündüz, Tommmy Svensson

Abstract:Efficient channel state information (CSI) compression is crucial in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems due to excessive feedback overhead. Recently, deep learning-based compression techniques have demonstrated superior performance across various data types, including CSI. However, these approaches often experience performance degradation when the data distribution changes due to their limited generalization capabilities. To address this challenge, we propose a model fine-tuning approach for CSI feedback in massive MIMO systems. The idea is to fine-tune the encoder/decoder network models in a dynamic fashion using the recent CSI samples. First, we explore encoder-only fine-tuning, where only the encoder parameters are updated, leaving the decoder and latent parameters unchanged. Next, we consider full-model fine-tuning, where the encoder and decoder models are jointly updated. Unlike encoder-only fine-tuning, full-model fine-tuning requires the updated decoder and latent parameters to be transmitted to the decoder side. To efficiently handle this, we propose different prior distributions for model updates, such as uniform and truncated Gaussian to entropy code them together with the compressed CSI and account for additional feedback overhead imposed by conveying the model updates. Moreover, we incorporate quantized model updates during fine-tuning to reflect the impact of quantization in the deployment phase. Our results demonstrate that full-model fine-tuning significantly enhances the rate-distortion (RD) performance of neural CSI compression. Furthermore, we analyze how often the full-model fine-tuning should be applied in a new wireless environment and identify an optimal period interval for achieving the best RD trade-off.

Via

Access Paper or Ask Questions

Semantics-Guided Diffusion for Deep Joint Source-Channel Coding in Wireless Image Transmission

Jan 02, 2025

Maojun Zhang, Haotian Wu, Guangxu Zhu, Richeng Jin, Xiaoming Chen, Deniz Gündüz

Abstract:Joint source-channel coding (JSCC) offers a promising avenue for enhancing transmission efficiency by jointly incorporating source and channel statistics into the system design. A key advancement in this area is the deep joint source and channel coding (DeepJSCC) technique that designs a direct mapping of input signals to channel symbols parameterized by a neural network, which can be trained for arbitrary channel models and semantic quality metrics. This paper advances the DeepJSCC framework toward a semantics-aligned, high-fidelity transmission approach, called semantics-guided diffusion DeepJSCC (SGD-JSCC). Existing schemes that integrate diffusion models (DMs) with JSCC face challenges in transforming random generation into accurate reconstruction and adapting to varying channel conditions. SGD-JSCC incorporates two key innovations: (1) utilizing some inherent information that contributes to the semantics of an image, such as text description or edge map, to guide the diffusion denoising process; and (2) enabling seamless adaptability to varying channel conditions with the help of a semantics-guided DM for channel denoising. The DM is guided by diverse semantic information and integrates seamlessly with DeepJSCC. In a slow fading channel, SGD-JSCC dynamically adapts to the instantaneous signal-to-noise ratio (SNR) directly estimated from the channel output, thereby eliminating the need for additional pilot transmissions for channel estimation. In a fast fading channel, we introduce a training-free denoising strategy, allowing SGD-JSCC to effectively adjust to fluctuations in channel gains. Numerical results demonstrate that, guided by semantic information and leveraging the powerful DM, our method outperforms existing DeepJSCC schemes, delivering satisfactory reconstruction performance even at extremely poor channel conditions.

* 13 pages, submitted to IEEE for possible publication

Via

Access Paper or Ask Questions