Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhou Wang

Boosting Image Quality Assessment Performance: Unsupervised Score Fusion by Deep Maximum a Posteriori Estimation

May 28, 2026

Zhongling Wang, Raymond Zhou, Shahrukh Athar, Wenbo Yang, Zhou Wang

Abstract:Over the past decades, numerous Image Quality Assessment (IQA) models have emerged, aiming to predict the perceptual quality of images. However, individual models are often biased toward certain types of image content or distortions, depending on the design principle and process. An intuitive idea is to harness the strengths and mitigate the weaknesses of each IQA model, by fusing the scores of multiple models into a stronger one. Here we make one of the first attempts to seek an optimal solution for the idea and propose a general framework for unsupervised IQA score fusion using deep Maximum a Posteriori (MAP) estimation. The proposed model conducts fine-grained uncertainty estimation at the score level to increase the accuracy and reduce the uncertainty in fused predictions. Comprehensive experiments demonstrate the superiority of the proposed model over individual IQA models and other fusion methods. It also exhibits an interesting capability of rejecting ``bad" models in the fusion process.

* 2024 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

Via

Access Paper or Ask Questions

Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

Apr 20, 2026

Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang

Abstract:Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based objectives suppress the chosen response along with the rejected one, a phenomenon known as likelihood displacement, and no general mechanism currently prevents this across objectives. We bridge this gap by presenting a unified \emph{incentive-score decomposition} of preference optimization, revealing that diverse objectives share identical local update directions and differ only in their scalar weighting coefficients. Building on this decomposition, by analyzing the dynamics of the chosen/rejected likelihoods, we identify the \emph{disentanglement band} (DB), a simple, testable condition that characterizes when training can avoid likelihood displacement by realizing the preferred pathway: suppressing the loser while maintaining the winner, possibly after an initial transient. Leveraging the DB, we propose a plug-and-play \emph{reward calibration} (RC) that adaptively rebalances chosen versus rejected updates to satisfy the DB and mitigate likelihood displacement, without redesigning the base objective. Empirical results show that RC steers training toward more disentangled dynamics and often improves downstream performance across a range of objectives. Our code is available at https://github.com/IceyWuu/DisentangledPreferenceOptimization.

Via

Access Paper or Ask Questions

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment

Mar 17, 2026

Mahdi Naseri, Zhou Wang

Abstract:No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.

* Submitted to IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

Automated Disentangling Analysis of Skin Colour for Lesion Images

Feb 25, 2026

Wenbo Yang, Eman Rezk, Walaa M. Moursi, Zhou Wang

Abstract:Machine-learning models applied to skin images often have degraded performance when the skin colour captured in images (SCCI) differs between training and deployment. These discrepancies arise from a combination of entangled environmental factors (e.g., illumination, camera settings) and intrinsic factors (e.g., skin tone) that cannot be accurately described by a single "skin tone" scalar -- a simplification commonly adopted by prior work. To mitigate such colour mismatches, we propose a skin-colour disentangling framework that adapts disentanglement-by-compression to learn a structured, manipulable latent space for SCCI from unlabelled dermatology images. To prevent information leakage that hinders proper learning of dark colour features, we introduce a randomized, mostly monotonic decolourization mapping. To suppress unintended colour shifts of localized patterns (e.g., ink marks, scars) during colour manipulation, we further propose a geometry-aligned post-processing step. Together, these components enable faithful counterfactual editing and answering an essential question: "What would this skin condition look like under a different SCCI?", as well as direct colour transfer between images and controlled traversal along physically meaningful directions (e.g., blood perfusion, camera white balance), enabling educational visualization of skin conditions under varying SCCI. We demonstrate that dataset-level augmentation and colour normalization based on our framework achieve competitive lesion classification performance. Ultimately, our work promotes equitable diagnosis through creating diverse training datasets that include different skin tones and image-capturing conditions.

Via

Access Paper or Ask Questions

Vision-Inspired Image Quality Assessment for Radar-Based Human Activity Representations

Feb 24, 2026

Huy Trinh, Davis Liu, Munia Humaira, Peter Lee, Zhou Wang

Abstract:Radar-based human activity recognition has gained attention as a privacy-preserving alternative to vision and wearable sensors, especially in sensitive environments like long-term care facilities. Micro-Doppler spectrograms derived from FMCW radar signals are central to recognizing dynamic activities, but their effectiveness is limited by noise and clutter. In this work, we use a benchmark radar dataset to reimplement and assess three recent denoising and preprocessing techniques: adaptive preprocessing, adaptive thresholding, and entropy-based denoising. To illustrate the shortcomings of conventional metrics in low-SNR regimes, we evaluate performance using both perceptual image quality measures and standard error-based metrics. We additionally propose a novel framework for static activity recognition using range-angle feature maps to expand HAR beyond dynamic activities. We present two important contributions: a temporal tracking algorithm to enforce consistency and a no-reference quality scoring algorithm to assess RA-map fidelity. According to experimental findings, our suggested techniques enhance classification performance and interpretability for both dynamic and static activities, opening the door for more reliable radar-based HAR systems.

Via

Access Paper or Ask Questions

Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field

Feb 23, 2026

Sheyang Tang, Armin Shafiee Sarvestani, Jialu Xu, Xiaoyu Xu, Zhou Wang

Abstract:The aesthetic quality of a scene depends strongly on camera viewpoint. Existing approaches for aesthetic viewpoint suggestion are either single-view adjustments, predicting limited camera adjustments from a single image without understanding scene geometry, or 3D exploration approaches, which rely on dense captures or prebuilt 3D environments coupled with costly reinforcement learning (RL) searches. In this work, we introduce the notion of 3D aesthetic field that enables geometry-grounded aesthetic reasoning in 3D with sparse captures, allowing efficient viewpoint suggestions in contrast to costly RL searches. We opt to learn this 3D aesthetic field using a feedforward 3D Gaussian Splatting network that distills high-level aesthetic knowledge from a pretrained 2D aesthetic model into 3D space, enabling aesthetic prediction for novel viewpoints from only sparse input views. Building on this field, we propose a two-stage search pipeline that combines coarse viewpoint sampling with gradient-based refinement, efficiently identifying aesthetically appealing viewpoints without dense captures or RL exploration. Extensive experiments show that our method consistently suggests viewpoints with superior framing and composition compared to existing approaches, establishing a new direction toward 3D-aware aesthetic modeling.

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

May 19, 2025

Wenbo Yang, Zhongling Wang, Zhou Wang

Figure 1 for Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

Figure 2 for Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

Figure 3 for Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

Figure 4 for Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

Abstract:Image degradation synthesis is highly desirable in a wide variety of applications ranging from image restoration to simulating artistic effects. Existing models are designed to generate one specific or a narrow set of degradations, which often require user-provided degradation parameters. As a result, they lack the generalizability to synthesize degradations beyond their initial design or adapt to other applications. Here we propose the first universal degradation model that can synthesize a broad spectrum of complex and realistic degradations containing both homogeneous (global) and inhomogeneous (spatially varying) components. Our model automatically extracts and disentangles homogeneous and inhomogeneous degradation features, which are later used for degradation synthesis without user intervention. A disentangle-by-compression method is proposed to separate degradation information from images. Two novel modules for extracting and incorporating inhomogeneous degradations are created to model inhomogeneous components in complex degradations. We demonstrate the model's accuracy and adaptability in film-grain simulation and blind image restoration tasks. The demo video, code, and dataset of this project will be released upon publication at github.com/yangwenbo99/content-degradation-disentanglement.

Via

Access Paper or Ask Questions

Architectural Exploration of Hybrid Neural Decoders for Neuromorphic Implantable BMI

May 09, 2025

Vivek Mohan, Biyan Zhou, Zhou Wang, Anil Bharath, Emmanuel Drakakis, Arindam Basu

Abstract:This work presents an efficient decoding pipeline for neuromorphic implantable brain-machine interfaces (Neu-iBMI), leveraging sparse neural event data from an event-based neural sensing scheme. We introduce a tunable event filter (EvFilter), which also functions as a spike detector (EvFilter-SPD), significantly reducing the number of events processed for decoding by 192X and 554X, respectively. The proposed pipeline achieves high decoding performance, up to R^2=0.73, with ANN- and SNN-based decoders, eliminating the need for signal recovery, spike detection, or sorting, commonly performed in conventional iBMI systems. The SNN-Decoder reduces computations and memory required by 5-23X compared to NN-, and LSTM-Decoders, while the ST-NN-Decoder delivers similar performance to an LSTM-Decoder requiring 2.5X fewer resources. This streamlined approach significantly reduces computational and memory demands, making it ideal for low-power, on-implant, or wearable iBMIs.

* The paper has been accepted for lecture presentation at the 2025 IEEE International Symposium on Circuits and Systems in London

Via

Access Paper or Ask Questions

Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model

Feb 21, 2025

Jiebin Yan, Ziwen Tan, Yuming Fang, Junjie Chen, Wenhui Jiang, Zhou Wang

Figure 1 for Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model

Figure 2 for Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model

Figure 3 for Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model

Figure 4 for Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model

Abstract:The fast growing application of omnidirectional images calls for effective approaches for omnidirectional image quality assessment (OIQA). Existing OIQA methods have been developed and tested on homogeneously distorted omnidirectional images, but it is hard to transfer their success directly to the heterogeneously distorted omnidirectional images. In this paper, we conduct the largest study so far on OIQA, where we establish a large-scale database called OIQ-10K containing 10,000 omnidirectional images with both homogeneous and heterogeneous distortions. A comprehensive psychophysical study is elaborated to collect human opinions for each omnidirectional image, together with the spatial distributions (within local regions or globally) of distortions, and the head and eye movements of the subjects. Furthermore, we propose a novel multitask-derived adaptive feature-tailoring OIQA model named IQCaption360, which is capable of generating a quality caption for an omnidirectional image in a manner of textual template. Extensive experiments demonstrate the effectiveness of IQCaption360, which outperforms state-of-the-art methods by a significant margin on the proposed OIQ-10K database. The OIQ-10K database and the related source codes are available at https://github.com/WenJuing/IQCaption360.

Via

Access Paper or Ask Questions

Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference

Dec 27, 2024

Keke Zhang, Weiling Chen, Tiesong Zhao, Zhou Wang

Figure 1 for Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference

Figure 2 for Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference

Figure 3 for Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference

Figure 4 for Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference

Abstract:Image Quality Assessment (IQA) with references plays an important role in optimizing and evaluating computer vision tasks. Traditional methods assume that all pixels of the reference and test images are fully aligned. Such Aligned-Reference IQA (AR-IQA) approaches fail to address many real-world problems with various geometric deformations between the two images. Although significant effort has been made to attack Geometrically-Disparate-Reference IQA (GDR-IQA) problem, it has been addressed in a task-dependent fashion, for example, by dedicated designs for image super-resolution and retargeting, or by assuming the geometric distortions to be small that can be countered by translation-robust filters or by explicit image registrations. Here we rethink this problem and propose a unified, non-training-based Deep Structural Similarity (DeepSSIM) approach to address the above problems in a single framework, which assesses structural similarity of deep features in a simple but efficient way and uses an attention calibration strategy to alleviate attention deviation. The proposed method, without application-specific design, achieves state-of-the-art performance on AR-IQA datasets and meanwhile shows strong robustness to various GDR-IQA test cases. Interestingly, our test also shows the effectiveness of DeepSSIM as an optimization tool for training image super-resolution, enhancement and restoration, implying an even wider generalizability. \footnote{Source code will be made public after the review is completed.

Via

Access Paper or Ask Questions