Alert button
Picture for Shiqi Wang

Shiqi Wang

Alert button

Generative Face Video Coding Techniques and Standardization Efforts: A Review

Nov 05, 2023
Bolin Chen, Jie Chen, Shiqi Wang, Yan Ye

Generative Face Video Coding (GFVC) techniques can exploit the compact representation of facial priors and the strong inference capability of deep generative models, achieving high-quality face video communication in ultra-low bandwidth scenarios. This paper conducts a comprehensive survey on the recent advances of the GFVC techniques and standardization efforts, which could be applicable to ultra low bitrate communication, user-specified animation/filtering and metaverse-related functionalities. In particular, we generalize GFVC systems within one coding framework and summarize different GFVC algorithms with their corresponding visual representations. Moreover, we review the GFVC standardization activities that are specified with supplemental enhancement information messages. Finally, we discuss fundamental challenges and broad applications on GFVC techniques and their standardization potentials, as well as envision their future trends. The project page can be found at https://github.com/Berlin0610/Awesome-Generative-Face-Video-Coding.

Viaarxiv icon

Pixel-Inconsistency Modeling for Image Manipulation Localization

Sep 30, 2023
Chenqi Kong, Anwei Luo, Shiqi Wang, Haoliang Li, Anderson Rocha, Alex C. Kot

Figure 1 for Pixel-Inconsistency Modeling for Image Manipulation Localization
Figure 2 for Pixel-Inconsistency Modeling for Image Manipulation Localization
Figure 3 for Pixel-Inconsistency Modeling for Image Manipulation Localization
Figure 4 for Pixel-Inconsistency Modeling for Image Manipulation Localization

Digital image forensics plays a crucial role in image authentication and manipulation localization. Despite the progress powered by deep neural networks, existing forgery localization methodologies exhibit limitations when deployed to unseen datasets and perturbed images (i.e., lack of generalization and robustness to real-world applications). To circumvent these problems and aid image integrity, this paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. The rationale is grounded on the observation that most image signal processors (ISP) involve the demosaicing process, which introduces pixel correlations in pristine images. Moreover, manipulating operations, including splicing, copy-move, and inpainting, directly affect such pixel regularity. We, therefore, first split the input image into several blocks and design masked self-attention mechanisms to model the global pixel dependency in input images. Simultaneously, we optimize another local pixel dependency stream to mine local manipulation clues within input forgery images. In addition, we design novel Learning-to-Weight Modules (LWM) to combine features from the two streams, thereby enhancing the final forgery localization performance. To improve the training process, we propose a novel Pixel-Inconsistency Data Augmentation (PIDA) strategy, driving the model to focus on capturing inherent pixel-level artifacts instead of mining semantic forgery traces. This work establishes a comprehensive benchmark integrating 15 representative detection models across 12 datasets. Extensive experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints and achieve state-of-the-art generalization and robustness performances in image manipulation localization.

Viaarxiv icon

Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation

Sep 07, 2023
Xiangjie Sui, Hanwei Zhu, Xuelin Liu, Yuming Fang, Shiqi Wang, Zhou Wang

Figure 1 for Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation
Figure 2 for Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation
Figure 3 for Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation
Figure 4 for Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation

Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that significantly sway user gaze patterns and the overreliance on a single viewport sequence from the 360$^\circ$ image for quality inference. To address these issues, we introduce a unique generative scanpath representation (GSR) for effective quality inference of 360$^\circ$ images, which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition. More specifically, given a viewing condition characterized by the starting point of viewing and exploration time, a set of scanpaths consisting of dynamic visual fixations can be produced using an apt scanpath generator. Following this vein, we use the scanpaths to convert the 360$^\circ$ image into the unique GSR, which provides a global overview of gazed-focused contents derived from scanpaths. As such, the quality inference of the 360$^\circ$ image is swiftly transformed to that of GSR. We then propose an efficient OIQA computational framework by learning the quality maps of GSR. Comprehensive experimental results validate that the predictions of the proposed framework are highly consistent with human perception in the spatiotemporal domain, especially in the challenging context of locally distorted 360$^\circ$ images under varied viewing conditions. The code will be released at https://github.com/xiangjieSui/GSR

* 12 pages, 5 figures 
Viaarxiv icon

Extreme Image Compression using Fine-tuned VQGAN Models

Aug 07, 2023
Qi Mao, Tinghan Yang, Yinuo Zhang, Shuyin Pan, Meng Wang, Shiqi Wang, Siwei Ma

Figure 1 for Extreme Image Compression using Fine-tuned VQGAN Models
Figure 2 for Extreme Image Compression using Fine-tuned VQGAN Models
Figure 3 for Extreme Image Compression using Fine-tuned VQGAN Models
Figure 4 for Extreme Image Compression using Fine-tuned VQGAN Models

Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. Nevertheless, their efficacy and applicability in achieving extreme compression ratios ($<0.1$ bpp) still remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)-based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We then propose clustering a pre-trained large-scale codebook into smaller codebooks using the K-means algorithm. This enables images to be represented as diverse ranges of VQ-indices maps, resulting in variable bitrates and different levels of reconstruction quality. Extensive qualitative and quantitative experiments on various datasets demonstrate that the proposed framework outperforms the state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception under extremely low bitrates.

* Generative Compression, Extreme Compression, VQGANs, Low Bitrate 
Viaarxiv icon

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Jul 10, 2023
Chongming Gao, Kexin Huang, Jiawei Chen, Yuan Zhang, Biao Li, Peng Jiang, Shiqi Wang, Zhong Zhang, Xiangnan He

Figure 1 for Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation
Figure 2 for Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation
Figure 3 for Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation
Figure 4 for Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Offline reinforcement learning (RL), a technology that offline learns a policy from logged data without the need to interact with online environments, has become a favorable choice in decision-making processes like interactive recommendation. Offline RL faces the value overestimation problem. To address it, existing methods employ conservatism, e.g., by constraining the learned policy to be close to behavior policies or punishing the rarely visited state-action pairs. However, when applying such offline RL to recommendation, it will cause a severe Matthew effect, i.e., the rich get richer and the poor get poorer, by promoting popular items or categories while suppressing the less popular ones. It is a notorious issue that needs to be addressed in practical recommender systems. In this paper, we aim to alleviate the Matthew effect in offline RL-based recommendation. Through theoretical analyses, we find that the conservatism of existing methods fails in pursuing users' long-term satisfaction. It inspires us to add a penalty term to relax the pessimism on states with high entropy of the logging policy and indirectly penalizes actions leading to less diverse states. This leads to the main technical contribution of the work: Debiased model-based Offline RL (DORL) method. Experiments show that DORL not only captures user interests well but also alleviates the Matthew effect. The implementation is available via https://github.com/chongminggao/DORL-codes.

* SIGIR 2023 Full Paper 
Viaarxiv icon

Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models

Jul 03, 2023
Jinhao Duan, Hao Cheng, Shiqi Wang, Chenan Wang, Alex Zavalny, Renjing Xu, Bhavya Kailkhura, Kaidi Xu

Figure 1 for Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
Figure 2 for Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
Figure 3 for Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
Figure 4 for Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models

Although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.

Viaarxiv icon

The Age of Synthetic Realities: Challenges and Opportunities

Jun 09, 2023
João Phillipe Cardenuto, Jing Yang, Rafael Padilha, Renjie Wan, Daniel Moreira, Haoliang Li, Shiqi Wang, Fernanda Andaló, Sébastien Marcel, Anderson Rocha

Figure 1 for The Age of Synthetic Realities: Challenges and Opportunities
Figure 2 for The Age of Synthetic Realities: Challenges and Opportunities
Figure 3 for The Age of Synthetic Realities: Challenges and Opportunities
Figure 4 for The Age of Synthetic Realities: Challenges and Opportunities

Synthetic realities are digital creations or augmentations that are contextually generated through the use of Artificial Intelligence (AI) methods, leveraging extensive amounts of data to construct new narratives or realities, regardless of the intent to deceive. In this paper, we delve into the concept of synthetic realities and their implications for Digital Forensics and society at large within the rapidly advancing field of AI. We highlight the crucial need for the development of forensic techniques capable of identifying harmful synthetic creations and distinguishing them from reality. This is especially important in scenarios involving the creation and dissemination of fake news, disinformation, and misinformation. Our focus extends to various forms of media, such as images, videos, audio, and text, as we examine how synthetic realities are crafted and explore approaches to detecting these malicious creations. Additionally, we shed light on the key research challenges that lie ahead in this area. This study is of paramount importance due to the rapid progress of AI generative techniques and their impact on the fundamental principles of Forensic Science.

Viaarxiv icon

Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization

Jun 05, 2023
Yibing Liu, Chris Xing Tian, Haoliang Li, Lei Ma, Shiqi Wang

Figure 1 for Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization
Figure 2 for Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization
Figure 3 for Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization
Figure 4 for Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization

The out-of-distribution (OOD) problem generally arises when neural networks encounter data that significantly deviates from the training data distribution, \ie, in-distribution (InD). In this paper, we study the OOD problem from a neuron activation view. We first formulate neuron activation states by considering both the neuron output and its influence on model decisions. Then, we propose the concept of \textit{neuron activation coverage} (NAC), which characterizes the neuron behaviors under InD and OOD data. Leveraging our NAC, we show that 1) InD and OOD inputs can be naturally separated based on the neuron behavior, which significantly eases the OOD detection problem and achieves a record-breaking performance of 0.03% FPR95 on ResNet-50, outperforming the previous best method by 20.67%; 2) a positive correlation between NAC and model generalization ability consistently holds across architectures and datasets, which enables a NAC-based criterion for evaluating model robustness. By comparison with the traditional validation criterion, we show that NAC-based criterion not only can select more robust models, but also has a stronger correlation with OOD test performance.

Viaarxiv icon

Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method

May 04, 2023
Yixuan Li, Bolin Chen, Baoliang Chen, Meng Wang, Shiqi Wang

Figure 1 for Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method
Figure 2 for Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method
Figure 3 for Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method
Figure 4 for Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method

Recent years have witnessed an exponential increase in the demand for face video compression, and the success of artificial intelligence has expanded the boundaries beyond traditional hybrid video coding. Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs, leveraging the statistical priors of face videos. However, the great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA). In this paper, we introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos. The database contains 3,240 compressed face video clips in multiple compression levels, which are derived from 135 source videos with diversified content using six representative video codecs, including two traditional methods based on hybrid coding frameworks, two end-to-end methods, and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) index for face video compression was developed to measure the perceptual quality, considering the distinct content characteristics and temporal priors of the face videos. Experimental results exhibit its superior performance on the proposed CFVQA dataset. The benchmark is now made publicly available at: https://github.com/Yixuan423/Compressed-Face-Videos-Quality-Assessment.

Viaarxiv icon