Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongxia Wang

Guiding Diffusion Models with Semantically Degraded Conditions

Mar 11, 2026

Shilong Han, Yuming Zhang, Hongxia Wang

Abstract:Classifier-Free Guidance (CFG) is a cornerstone of modern text-to-image models, yet its reliance on a semantically vacuous null prompt ($\varnothing$) generates a guidance signal prone to geometric entanglement. This is a key factor limiting its precision, leading to well-documented failures in complex compositional tasks. We propose Condition-Degradation Guidance (CDG), a novel paradigm that replaces the null prompt with a strategically degraded condition, $\boldsymbol{c}_{\text{deg}}$. This reframes guidance from a coarse "good vs. null" contrast to a more refined "good vs. almost good" discrimination, thereby compelling the model to capture fine-grained semantic distinctions. We find that tokens in transformer text encoders split into two functional roles: content tokens encoding object semantics, and context-aggregating tokens capturing global context. By selectively degrading only the former, CDG constructs $\boldsymbol{c}_{\text{deg}}$ without external models or training. Validated across diverse architectures including Stable Diffusion 3, FLUX, and Qwen-Image, CDG markedly improves compositional accuracy and text-image alignment. As a lightweight, plug-and-play module, it achieves this with negligible computational overhead. Our work challenges the reliance on static, information-sparse negative samples and establishes a new principle for diffusion guidance: the construction of adaptive, semantically-aware negative samples is critical to achieving precise semantic control. Code is available at https://github.com/Ming-321/Classifier-Degradation-Guidance.

* Accepted to CVPR 2026

Via

Access Paper or Ask Questions

OCP-LS: An Efficient Algorithm for Visual Localization

Dec 31, 2025

Jindi Zhong, Hongxia Wang, Huanshui Zhang

Abstract:This paper proposes a novel second-order optimization algorithm. It aims to address large-scale optimization problems in deep learning because it incorporates the OCP method and appropriately approximating the diagonal elements of the Hessian matrix. Extensive experiments on multiple standard visual localization benchmarks demonstrate the significant superiority of the proposed method. Compared with conventional optimiza tion algorithms, our framework achieves competitive localization accuracy while exhibiting faster convergence, enhanced training stability, and improved robustness to noise interference.

Via

Access Paper or Ask Questions

A novel algorithm for optimizing bundle adjustment in image sequence alignment

Nov 10, 2024

Hailin Xu, Hongxia Wang, Huanshui Zhang

Figure 1 for A novel algorithm for optimizing bundle adjustment in image sequence alignment

Figure 2 for A novel algorithm for optimizing bundle adjustment in image sequence alignment

Figure 3 for A novel algorithm for optimizing bundle adjustment in image sequence alignment

Figure 4 for A novel algorithm for optimizing bundle adjustment in image sequence alignment

Abstract:The Bundle Adjustment (BA) model is commonly optimized using a nonlinear least squares method, with the Levenberg-Marquardt (L-M) algorithm being a typical choice. However, despite the L-M algorithm's effectiveness, its sensitivity to initial conditions often results in slower convergence when applied to poorly conditioned datasets, motivating the exploration of alternative optimization strategies. This paper introduces a novel algorithm for optimizing the BA model in the context of image sequence alignment for cryo-electron tomography, utilizing optimal control theory to directly optimize general nonlinear functions. The proposed Optimal Control Algorithm (OCA) exhibits superior convergence rates and effectively mitigates the oscillatory behavior frequently observed in L-M algorithm. Extensive experiments on both synthetic and real-world datasets were conducted to evaluate the algorithm's performance. The results demonstrate that the OCA achieves faster convergence compared to the L-M algorithm. Moreover, the incorporation of a bisection-based update procedure significantly enhances the OCA's performance, particularly in poorly initialized datasets. These findings indicate that the OCA can substantially improve the efficiency of 3D reconstructions in cryo-electron tomography.

Via

Access Paper or Ask Questions

TVG: A Training-free Transition Video Generation Method with Diffusion Models

Aug 24, 2024

Rui Zhang, Yaosen Chen, Yuegen Liu, Wei Wang, Xuming Wen, Hongxia Wang

Abstract:Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes. We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training. Our method leverages Gaussian Process Regression ($\mathcal{GPR}$) to model latent representations, ensuring smooth and dynamic transitions between frames. Additionally, we introduce interpolation-based conditional controls and a Frequency-aware Bidirectional Fusion (FBiF) architecture to enhance temporal control and transition reliability. Evaluations of benchmark datasets and custom image pairs demonstrate the effectiveness of our approach in generating high-quality smooth transition videos. The code are provided in https://sobeymil.github.io/tvg.com.

Via

Access Paper or Ask Questions

A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

Aug 22, 2024

Shuchang Zhang, Hongxia Wang

Figure 1 for A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

Figure 2 for A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

Figure 3 for A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

Figure 4 for A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

Abstract:In recent years Plug-and-Play (PnP) methods have achieved state-of-the-art performance in inverse imaging problems by replacing proximal operators with denoisers. Based on the proximal gradient method, some theoretical results of PnP have appeared, where appropriate step size is crucial for convergence analysis. However, in practical applications, applying PnP methods with theoretically guaranteed step sizes is difficult, and these algorithms are limited to Gaussian noise. In this paper,from a perspective of split convex feasibility problems (SCFP), an adaptive PnP algorithm with Projected Landweber Operator (PnP-PLO) is proposed to address these issues. Numerical experiments on image deblurring, super-resolution, and compressed sensing MRI experiments illustrate that PnP-PLO with theoretical guarantees outperforms state-of-the-art methods such as RED and RED-PRO.

Via

Access Paper or Ask Questions

HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

Jul 18, 2024

Shuchang Zhang, Hui Zhang, Hongxia Wang

Figure 1 for HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

Figure 2 for HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

Figure 3 for HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

Figure 4 for HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

Abstract:Preconditioned Proximal Point (PPP) algorithms provide a unified framework for splitting methods in image restoration. Recent advancements with RED (Regularization by Denoising) and PnP (Plug-and-Play) priors have achieved state-of-the-art performance in this domain, emphasizing the need for a meaningful particular solution. However, degenerate PPP algorithms typically exhibit weak convergence in infinite-dimensional Hilbert space, leading to uncertain solutions. To address this issue, we propose the Halpern-type Preconditioned Proximal Point (HPPP) algorithm, which leverages the strong convergence properties of Halpern iteration to achieve a particular solution. Based on the implicit regularization defined by gradient RED, we further introduce the Gradient REgularization by Denoising via HPPP called GraRED-HP3 algorithm. The HPPP algorithm is shown to have the regularity converging to a particular solution by a toy example. Additionally, experiments in image deblurring and inpainting validate the effectiveness of GraRED-HP3, showing it surpasses classical methods such as Chambolle-Pock (CP), PPP, RED, and RED-PRO.

Via

Access Paper or Ask Questions

UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Aug 28, 2023

Rui Zhang, Hongxia Wang, Mingshan Du, Hanqing Liu, Yang Zhou, Qiang Zeng

Figure 1 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Figure 2 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Figure 3 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Figure 4 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Abstract:The emergence of artificial intelligence-generated content (AIGC) has raised concerns about the authenticity of multimedia content in various fields. However, existing research for forgery content detection has focused mainly on binary classification tasks of complete videos, which has limited applicability in industrial settings. To address this gap, we propose UMMAFormer, a novel universal transformer framework for temporal forgery localization (TFL) that predicts forgery segments with multimodal adaptation. Our approach introduces a Temporal Feature Abnormal Attention (TFAA) module based on temporal feature reconstruction to enhance the detection of temporal differences. We also design a Parallel Cross-Attention Feature Pyramid Network (PCA-FPN) to optimize the Feature Pyramid Network (FPN) for subtle feature enhancement. To evaluate the proposed method, we contribute a novel Temporal Video Inpainting Localization (TVIL) dataset specifically tailored for video inpainting scenes. Our experiments show that our approach achieves state-of-the-art performance on benchmark datasets, including Lav-DF, TVIL, and Psynd, significantly outperforming previous methods. The code and data are available at https://github.com/ymhzyj/UMMAFormer/.

* Proceedings of the 31st ACM International Conference on Multimedia (MM '23), October 29-November 3, 2023
* 11 pages, 8 figures, 66 references. This paper has been accepted for ACM MM 2023

Via

Access Paper or Ask Questions

Phase Retrieval with Background Information: Decreased References and Efficient Methods

Aug 16, 2023

Ziyang Yuan, Haoxing Yang, Ningyi Leng, Hongxia Wang

Abstract:Fourier phase retrieval(PR) is a severely ill-posed inverse problem that arises in various applications. To guarantee a unique solution and relieve the dependence on the initialization, background information can be exploited as a structural priors. However, the requirement for the background information may be challenging when moving to the high-resolution imaging. At the same time, the previously proposed projected gradient descent(PGD) method also demands much background information. In this paper, we present an improved theoretical result about the demand for the background information, along with two Douglas Rachford(DR) based methods. Analytically, we demonstrate that the background required to ensure a unique solution can be decreased by nearly $1/2$ for the 2-D signals compared to the 1-D signals. By generalizing the results into $d$-dimension, we show that the length of the background information more than $(2^{\frac{d+1}{d}}-1)$ folds of the signal is sufficient to ensure the uniqueness. At the same time, we also analyze the stability and robustness of the model when measurements and background information are corrupted by the noise. Furthermore, two methods called Background Douglas-Rachford (BDR) and Convex Background Douglas-Rachford (CBDR) are proposed. BDR which is a kind of non-convex method is proven to have the local R-linear convergence rate under mild assumptions. Instead, CBDR method uses the techniques of convexification and can be proven to own a global convergence guarantee as long as the background information is sufficient. To support this, a new property called F-RIP is established. We test the performance of the proposed methods through simulations as well as real experimental measurements, and demonstrate that they achieve a higher recovery rate with less background information compared to the PGD method.

Via

Access Paper or Ask Questions

Untrained neural network embedded Fourier phase retrieval from few measurements

Jul 16, 2023

Liyuan Ma, Hongxia Wang, Ningyi Leng, Ziyang Yuan

Figure 1 for Untrained neural network embedded Fourier phase retrieval from few measurements

Figure 2 for Untrained neural network embedded Fourier phase retrieval from few measurements

Figure 3 for Untrained neural network embedded Fourier phase retrieval from few measurements

Figure 4 for Untrained neural network embedded Fourier phase retrieval from few measurements

Abstract:Fourier phase retrieval (FPR) is a challenging task widely used in various applications. It involves recovering an unknown signal from its Fourier phaseless measurements. FPR with few measurements is important for reducing time and hardware costs, but it suffers from serious ill-posedness. Recently, untrained neural networks have offered new approaches by introducing learned priors to alleviate the ill-posedness without requiring any external data. However, they may not be ideal for reconstructing fine details in images and can be computationally expensive. This paper proposes an untrained neural network (NN) embedded algorithm based on the alternating direction method of multipliers (ADMM) framework to solve FPR with few measurements. Specifically, we use a generative network to represent the image to be recovered, which confines the image to the space defined by the network structure. To improve the ability to represent high-frequency information, total variation (TV) regularization is imposed to facilitate the recovery of local structures in the image. Furthermore, to reduce the computational cost mainly caused by the parameter updates of the untrained NN, we develop an accelerated algorithm that adaptively trades off between explicit and implicit regularization. Experimental results indicate that the proposed algorithm outperforms existing untrained NN-based algorithms with fewer computational resources and even performs competitively against trained NN-based algorithms.

Via

Access Paper or Ask Questions

Regularize implicit neural representation by itself

Mar 27, 2023

Zhemin Li, Hongxia Wang, Deyu Meng

Figure 1 for Regularize implicit neural representation by itself

Figure 2 for Regularize implicit neural representation by itself

Figure 3 for Regularize implicit neural representation by itself

Figure 4 for Regularize implicit neural representation by itself

Abstract:This paper proposes a regularizer called Implicit Neural Representation Regularizer (INRR) to improve the generalization ability of the Implicit Neural Representation (INR). The INR is a fully connected network that can represent signals with details not restricted by grid resolution. However, its generalization ability could be improved, especially with non-uniformly sampled data. The proposed INRR is based on learned Dirichlet Energy (DE) that measures similarities between rows/columns of the matrix. The smoothness of the Laplacian matrix is further integrated by parameterizing DE with a tiny INR. INRR improves the generalization of INR in signal representation by perfectly integrating the signal's self-similarity with the smoothness of the Laplacian matrix. Through well-designed numerical experiments, the paper also reveals a series of properties derived from INRR, including momentum methods like convergence trajectory and multi-scale similarity. Moreover, the proposed method could improve the performance of other signal representation methods.

* Highlight paper in CVPR 2023

Via

Access Paper or Ask Questions