Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shiqi Wang

PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

Feb 04, 2025

Hongwei Li, Yuheng Tang, Shiqi Wang, Wenbo Guo

Figure 1 for PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

Figure 2 for PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

Figure 3 for PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

Figure 4 for PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

Abstract:Recent research builds various patching agents that combine large language models (LLMs) with non-ML tools and achieve promising results on the state-of-the-art (SOTA) software patching benchmark, SWE-Bench. Based on how to determine the patching workflows, existing patching agents can be categorized as agent-based planning methods, which rely on LLMs for planning, and human-based planning methods, which follow a pre-defined workflow. At a high level, agent-based planning methods achieve high patching performance but with a high cost and limited stability. Human-based planning methods, on the other hand, are more stable and efficient but have key workflow limitations that compromise their patching performance. In this paper, we propose PatchPilot, an agentic patcher that strikes a balance between patching efficacy, stability, and cost-efficiency. PatchPilot proposes a novel human-based planning workflow with five components: reproduction, localization, generation, validation, and refinement (where refinement is unique to PatchPilot). We introduce novel and customized designs to each component to optimize their effectiveness and efficiency. Through extensive experiments on the SWE-Bench benchmarks, PatchPilot shows a superior performance than existing open-source methods while maintaining low cost (less than 1$ per instance) and ensuring higher stability. We also conduct a detailed ablation study to validate the key designs in each component.

Via

Access Paper or Ask Questions

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Jan 08, 2025

Kangsheng Yin, Quan Liu, Xuelin Shen, Yulin He, Wenhan Yang, Shiqi Wang

Figure 1 for Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Figure 2 for Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Figure 3 for Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Figure 4 for Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Abstract:The image compression model has long struggled with adaptability and generalization, as the decoded bitstream typically serves only human or machine needs and fails to preserve information for unseen visual tasks. Therefore, this paper innovatively introduces supervision obtained from multimodal pre-training models and incorporates adaptive multi-objective optimization tailored to support both human visual perception and machine vision simultaneously with a single bitstream, denoted as Unified and Generalized Image Coding for Machine (UG-ICM). Specifically, to get rid of the reliance between compression models with downstream task supervision, we introduce Contrastive Language-Image Pre-training (CLIP) models into the training constraint for improved generalization. Global-to-instance-wise CLIP supervision is applied to help obtain hierarchical semantics that make models more generalizable for the tasks relying on the information of different granularity. Furthermore, for supporting both human and machine visions with only a unifying bitstream, we incorporate a conditional decoding strategy that takes as conditions human or machine preferences, enabling the bitstream to be decoded into different versions for corresponding preferences. As such, our proposed UG-ICM is fully trained in a self-supervised manner, i.e., without awareness of any specific downstream models and tasks. The extensive experiments have shown that the proposed UG-ICM is capable of achieving remarkable improvements in various unseen machine analytics tasks, while simultaneously providing perceptually satisfying images.

* 9 pages, 10 figures, publised to AAAI 2025

Via

Access Paper or Ask Questions

AI-generated Image Quality Assessment in Visual Communication

Dec 20, 2024

Yu Tian, Yixuan Li, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Sam Kwong

Figure 1 for AI-generated Image Quality Assessment in Visual Communication

Figure 2 for AI-generated Image Quality Assessment in Visual Communication

Figure 3 for AI-generated Image Quality Assessment in Visual Communication

Figure 4 for AI-generated Image Quality Assessment in Visual Communication

Abstract:Assessing the quality of artificial intelligence-generated images (AIGIs) plays a crucial role in their application in real-world scenarios. However, traditional image quality assessment (IQA) algorithms primarily focus on low-level visual perception, while existing IQA works on AIGIs overemphasize the generated content itself, neglecting its effectiveness in real-world applications. To bridge this gap, we propose AIGI-VC, a quality assessment database for AI-Generated Images in Visual Communication, which studies the communicability of AIGIs in the advertising field from the perspectives of information clarity and emotional interaction. The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types. It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning. We conduct an empirical study of existing representative IQA methods and large multi-modal models on the AIGI-VC dataset, uncovering their strengths and weaknesses.

* AAAI-2025; Project page: https://github.com/ytian73/AIGI-VC

Via

Access Paper or Ask Questions

LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model

Dec 05, 2024

Yuan Xue, Qi Zhang, Chuanmin Jia, Shiqi Wang

Figure 1 for LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model

Figure 2 for LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model

Figure 3 for LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model

Figure 4 for LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model

Abstract:Image Compression for Machines (ICM) aims to compress images for machine vision tasks rather than human viewing. Current works predominantly concentrate on high-level tasks like object detection and semantic segmentation. However, the quality of original images is usually not guaranteed in the real world, leading to even worse perceptual quality or downstream task performance after compression. Low-level (LL) machine vision models, like image restoration models, can help improve such quality, and thereby their compression requirements should also be considered. In this paper, we propose a pioneered ICM framework for LL machine vision tasks, namely LL-ICM. By jointly optimizing compression and LL tasks, the proposed LL-ICM not only enriches its encoding ability in generalizing to versatile LL tasks but also optimizes the processing ability of down-stream LL task models, achieving mutual adaptation for image codecs and LL task models. Furthermore, we integrate large-scale vision-language models into the LL-ICM framework to generate more universal and distortion-robust feature embeddings for LL vision tasks. Therefore, one LL-ICM codec can generalize to multiple tasks. We establish a solid benchmark to evaluate LL-ICM, which includes extensive objective experiments by using both full and no-reference image quality assessments. Experimental results show that LL-ICM can achieve 22.65% BD-rate reductions over the state-of-the-art methods.

Via

Access Paper or Ask Questions

An Information-Theoretic Regularizer for Lossy Neural Image Compression

Nov 23, 2024

Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, Shiqi Wang

Figure 1 for An Information-Theoretic Regularizer for Lossy Neural Image Compression

Figure 2 for An Information-Theoretic Regularizer for Lossy Neural Image Compression

Figure 3 for An Information-Theoretic Regularizer for Lossy Neural Image Compression

Figure 4 for An Information-Theoretic Regularizer for Lossy Neural Image Compression

Abstract:Lossy image compression networks aim to minimize the latent entropy of images while adhering to specific distortion constraints. However, optimizing the neural network can be challenging due to its nature of learning quantized latent representations. In this paper, our key finding is that minimizing the latent entropy is, to some extent, equivalent to maximizing the conditional source entropy, an insight that is deeply rooted in information-theoretic equalities. Building on this insight, we propose a novel structural regularization method for the neural image compression task by incorporating the negative conditional source entropy into the training objective, such that both the optimization efficacy and the model's generalization ability can be promoted. The proposed information-theoretic regularizer is interpretable, plug-and-play, and imposes no inference overheads. Extensive experiments demonstrate its superiority in regularizing the models and further squeezing bits from the latent representation across various compression structures and unseen domains.

* 12 pages, 8 figures

Via

Access Paper or Ask Questions

Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective

Nov 21, 2024

Peilin Chen, Xiaohan Fang, Meng Wang, Shiqi Wang, Siwei Ma

Figure 1 for Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective

Figure 2 for Compact Visual Data Representation for Green Multimedia -- A Human Visual System Perspective

Abstract:The Human Visual System (HVS), with its intricate sophistication, is capable of achieving ultra-compact information compression for visual signals. This remarkable ability is coupled with high generalization capability and energy efficiency. By contrast, the state-of-the-art Versatile Video Coding (VVC) standard achieves a compression ratio of around 1,000 times for raw visual data. This notable disparity motivates the research community to draw inspiration to effectively handle the immense volume of visual data in a green way. Therefore, this paper provides a survey of how visual data can be efficiently represented for green multimedia, in particular when the ultimate task is knowledge extraction instead of visual signal reconstruction. We introduce recent research efforts that promote green, sustainable, and efficient multimedia in this field. Moreover, we discuss how the deep understanding of the HVS can benefit the research community, and envision the development of future green multimedia technologies.

Via

Access Paper or Ask Questions

Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Nov 19, 2024

Kecheng Chen, Pingping Zhang, Hui Liu, Jie Liu, Yibing Liu, Jixin Huang, Shiqi Wang, Hong Yan, Haoliang Li

Figure 1 for Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Figure 2 for Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Figure 3 for Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Figure 4 for Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Abstract:We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P$^{2}$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P$^{2}$-LLM can beat SOTA classical and learned codecs.

Via

Access Paper or Ask Questions

Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

Nov 19, 2024

Siyi Pan, Baoliang Chen, Danni Huang, Hanwei Zhu, Lingyu Zhu, Xiangjie Sui, Shiqi Wang

Figure 1 for Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

Figure 2 for Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

Figure 3 for Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

Figure 4 for Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

Abstract:Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LMMs are primarily trained for high-level tasks (e.g., image captioning), emphasizing unified image semantics extraction under varied quality. Such semantic-aware yet quality-insensitive perception bias inevitably leads to a heavy reliance on image semantics when those LMMs are forced for quality rating. In this paper, instead of retraining or tuning an LMM costly, we propose a training-free debiasing framework, in which the image quality prediction is rectified by mitigating the bias caused by image semantics. Specifically, we first explore several semantic-preserving distortions that can significantly degrade image quality while maintaining identifiable semantics. By applying these specific distortions to the query or test images, we ensure that the degraded images are recognized as poor quality while their semantics remain. During quality inference, both a query image and its corresponding degraded version are fed to the LMM along with a prompt indicating that the query image quality should be inferred under the condition that the degraded one is deemed poor quality.This prior condition effectively aligns the LMM's quality perception, as all degraded images are consistently rated as poor quality, regardless of their semantic difference.Finally, the quality scores of the query image inferred under different prior conditions (degraded versions) are aggregated using a conditional probability model. Extensive experiments on various IQA datasets show that our debiasing framework could consistently enhance the LMM performance and the code will be publicly available.

Via

Access Paper or Ask Questions

Standardizing Generative Face Video Compression using Supplemental Enhancement Information

Oct 19, 2024

Bolin Chen, Yan Ye, Jie Chen, Ru-Ling Liao, Shanzhi Yin, Shiqi Wang, Kaifa Yang, Yue Li, Yiling Xu, Ye-Kui Wang(+5 more)

Figure 1 for Standardizing Generative Face Video Compression using Supplemental Enhancement Information

Figure 2 for Standardizing Generative Face Video Compression using Supplemental Enhancement Information

Figure 3 for Standardizing Generative Face Video Compression using Supplemental Enhancement Information

Figure 4 for Standardizing Generative Face Video Compression using Supplemental Enhancement Information

Abstract:This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI), where a series of compact spatial and temporal representations of a face video signal (i.e., 2D/3D key-points, facial semantics and compact features) can be coded using SEI message and inserted into the coded video bitstream. At the time of writing, the proposed GFVC approach is an official "technology under consideration" (TuC) for standardization by the Joint Video Experts Team (JVET) of ISO/IEC JVT 1/SC 29 and ITU-T SG16. To the best of the authors' knowledge, the JVET work on the proposed SEI-based GFVC approach is the first standardization activity for generative video compression. The proposed SEI approach has not only advanced the reconstruction quality of early-day Model-Based Coding (MBC) via the state-of-the-art generative technique, but also established a new SEI definition for future GFVC applications and deployment. Experimental results illustrate that the proposed SEI-based GFVC approach can achieve remarkable rate-distortion performance compared with the latest Versatile Video Coding (VVC) standard, whilst also potentially enabling a wide variety of functionalities including user-specified animation/filtering and metaverse-related applications.

Via

Access Paper or Ask Questions

Test-time adaptation for image compression with distribution regularization

Oct 16, 2024

Kecheng Chen, Pingping Zhang, Tiexin Qin, Shiqi Wang, Hong Yan, Haoliang Li

Figure 1 for Test-time adaptation for image compression with distribution regularization

Figure 2 for Test-time adaptation for image compression with distribution regularization

Figure 3 for Test-time adaptation for image compression with distribution regularization

Figure 4 for Test-time adaptation for image compression with distribution regularization

Abstract:Current test- or compression-time adaptation image compression (TTA-IC) approaches, which leverage both latent and decoder refinements as a two-step adaptation scheme, have potentially enhanced the rate-distortion (R-D) performance of learned image compression models on cross-domain compression tasks, \textit{e.g.,} from natural to screen content images. However, compared with the emergence of various decoder refinement variants, the latent refinement, as an inseparable ingredient, is barely tailored to cross-domain scenarios. To this end, we aim to develop an advanced latent refinement method by extending the effective hybrid latent refinement (HLR) method, which is designed for \textit{in-domain} inference improvement but shows noticeable degradation of the rate cost in \textit{cross-domain} tasks. Specifically, we first provide theoretical analyses, in a cue of marginalization approximation from in- to cross-domain scenarios, to uncover that the vanilla HLR suffers from an underlying mismatch between refined Gaussian conditional and hyperprior distributions, leading to deteriorated joint probability approximation of marginal distribution with increased rate consumption. To remedy this issue, we introduce a simple Bayesian approximation-endowed \textit{distribution regularization} to encourage learning a better joint probability approximation in a plug-and-play manner. Extensive experiments on six in- and cross-domain datasets demonstrate that our proposed method not only improves the R-D performance compared with other latent refinement counterparts, but also can be flexibly integrated into existing TTA-IC methods with incremental benefits.

Via

Access Paper or Ask Questions