Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhihan Zhu

Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

Oct 31, 2025

Weiming Chen, Yijia Wang, Zhihan Zhu, Zhihai He

Figure 1 for Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

Figure 2 for Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

Figure 3 for Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

Figure 4 for Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

Abstract:We consider the problem of ultra-low bit rate visual communication for remote vision analysis, human interactions and control in challenging scenarios with very low communication bandwidth, such as deep space exploration, battlefield intelligence, and robot navigation in complex environments. In this paper, we ask the following important question: can we accurately reconstruct the visual scene using only a very small portion of the bit rate in existing coding methods while not sacrificing the accuracy of vision analysis and performance of human interactions? Existing text-to-image generation models offer a new approach for ultra-low bitrate image description. However, they can only achieve a semantic-level approximation of the visual scene, which is far insufficient for the purpose of visual communication and remote vision analysis and human interactions. To address this important issue, we propose to seamlessly integrate image generation with deep image compression, using joint text and coding latent to guide the rectified flow models for precise generation of the visual scene. The semantic text description and coding latent are both encoded and transmitted to the decoder at a very small bit rate. Experimental results demonstrate that our method can achieve the same image reconstruction quality and vision analysis accuracy as existing methods while using much less bandwidth. The code will be released upon paper acceptance.

Via

Access Paper or Ask Questions

Runge-Kutta Approximation and Decoupled Attention for Rectified Flow Inversion and Semantic Editing

Sep 16, 2025

Weiming Chen, Zhihan Zhu, Yijia Wang, Zhihai He

Abstract:Rectified flow (RF) models have recently demonstrated superior generative performance compared to DDIM-based diffusion models. However, in real-world applications, they suffer from two major challenges: (1) low inversion accuracy that hinders the consistency with the source image, and (2) entangled multimodal attention in diffusion transformers, which hinders precise attention control. To address the first challenge, we propose an efficient high-order inversion method for rectified flow models based on the Runge-Kutta solver of differential equations. To tackle the second challenge, we introduce Decoupled Diffusion Transformer Attention (DDTA), a novel mechanism that disentangles text and image attention inside the multimodal diffusion transformers, enabling more precise semantic control. Extensive experiments on image reconstruction and text-guided editing tasks demonstrate that our method achieves state-of-the-art performance in terms of fidelity and editability. Code is available at https://github.com/wmchen/RKSovler_DDTA.

Via

Access Paper or Ask Questions

SPP-SBL: Space-Power Prior Sparse Bayesian Learning for Block Sparse Recovery

May 13, 2025

Yanhao Zhang, Zhihan Zhu, Yong Xia

Abstract:The recovery of block-sparse signals with unknown structural patterns remains a fundamental challenge in structured sparse signal reconstruction. By proposing a variance transformation framework, this paper unifies existing pattern-based block sparse Bayesian learning methods, and introduces a novel space power prior based on undirected graph models to adaptively capture the unknown patterns of block-sparse signals. By combining the EM algorithm with high-order equation root-solving, we develop a new structured sparse Bayesian learning method, SPP-SBL, which effectively addresses the open problem of space coupling parameter estimation in pattern-based methods. We further demonstrate that learning the relative values of space coupling parameters is key to capturing unknown block-sparse patterns and improving recovery accuracy. Experiments validate that SPP-SBL successfully recovers various challenging structured sparse signals (e.g., chain-structured signals and multi-pattern sparse signals) and real-world multi-modal structured sparse signals (images, audio), showing significant advantages in recovery accuracy across multiple metrics.

* 12 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

Learning with Diversification from Block Sparse Signal

Feb 07, 2024

Yanhao Zhang, Zhihan Zhu, Yong Xia

Figure 1 for Learning with Diversification from Block Sparse Signal

Figure 2 for Learning with Diversification from Block Sparse Signal

Figure 3 for Learning with Diversification from Block Sparse Signal

Figure 4 for Learning with Diversification from Block Sparse Signal

Abstract:This paper introduces a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data. By allowing diversification on variance and correlation matrix, we effectively address the sensitivity issue of existing block sparse learning methods to pre-defined block information, which enables adaptive block estimation while mitigating the risk of overfitting. Based on this, a diversified block sparse Bayesian learning method (DivSBL) is proposed, utilizing EM algorithm and dual ascent method for hyperparameter estimation. Moreover, we establish the global and local optimality theory of our model. Experiments validate the advantages of DivSBL over existing algorithms.

* 12 pages, 12 figures, 3 tables

Via

Access Paper or Ask Questions