Alert button
Picture for Zhiwu Huang

Zhiwu Huang

Alert button

Adaptive Riemannian Metrics on SPD Manifolds

Mar 26, 2023
Ziheng Chen, Tianyang Xu, Zhiwu Huang, Yue Song, Xiao-Jun Wu, Nicu Sebe

Figure 1 for Adaptive Riemannian Metrics on SPD Manifolds
Figure 2 for Adaptive Riemannian Metrics on SPD Manifolds
Figure 3 for Adaptive Riemannian Metrics on SPD Manifolds
Figure 4 for Adaptive Riemannian Metrics on SPD Manifolds

Symmetric Positive Definite (SPD) matrices have received wide attention in machine learning due to their intrinsic capacity of encoding underlying structural correlation in data. To reflect the non-Euclidean geometry of SPD manifolds, many successful Riemannian metrics have been proposed. However, existing fixed metric tensors might lead to sub-optimal performance for SPD matrices learning, especially for SPD neural networks. To remedy this limitation, we leverage the idea of pullback and propose adaptive Riemannian metrics for SPD manifolds. Moreover, we present comprehensive theories for our metrics. Experiments on three datasets demonstrate that equipped with the proposed metrics, SPD networks can exhibit superior performance.

Viaarxiv icon

Freestyle Layout-to-Image Synthesis

Mar 25, 2023
Han Xue, Zhiwu Huang, Qianru Sun, Li Song, Wenjun Zhang

Figure 1 for Freestyle Layout-to-Image Synthesis
Figure 2 for Freestyle Layout-to-Image Synthesis
Figure 3 for Freestyle Layout-to-Image Synthesis
Figure 4 for Freestyle Layout-to-Image Synthesis

Typical layout-to-image synthesis (LIS) models generate images for a closed set of semantic classes, e.g., 182 common objects in COCO-Stuff. In this work, we explore the freestyle capability of the model, i.e., how far can it generate unseen semantics (e.g., classes, attributes, and styles) onto a given layout, and call the task Freestyle LIS (FLIS). Thanks to the development of large-scale pre-trained language-image models, a number of discriminative models (e.g., image classification and object detection) trained on limited base classes are empowered with the ability of unseen class prediction. Inspired by this, we opt to leverage large-scale pre-trained text-to-image diffusion models to achieve the generation of unseen semantics. The key challenge of FLIS is how to enable the diffusion model to synthesize images from a specific layout which very likely violates its pre-learned knowledge, e.g., the model never sees "a unicorn sitting on a bench" during its pre-training. To this end, we introduce a new module called Rectified Cross-Attention (RCA) that can be conveniently plugged in the diffusion model to integrate semantic masks. This "plug-in" is applied in each cross-attention layer of the model to rectify the attention maps between image and text tokens. The key idea of RCA is to enforce each text token to act on the pixels in a specified region, allowing us to freely put a wide variety of semantics from pre-trained knowledge (which is general) onto the given layout (which is specific). Extensive experiments show that the proposed diffusion network produces realistic and freestyle layout-to-image generation results with diverse text inputs, which has a high potential to spawn a bunch of interesting applications. Code is available at https://github.com/essunny310/FreestyleNet.

* Accepted to CVPR 2023 
Viaarxiv icon

Benchmarking Deepart Detection

Feb 28, 2023
Yabin Wang, Zhiwu Huang, Xiaopeng Hong

Figure 1 for Benchmarking Deepart Detection
Figure 2 for Benchmarking Deepart Detection
Figure 3 for Benchmarking Deepart Detection
Figure 4 for Benchmarking Deepart Detection

Deepfake technologies have been blurring the boundaries between the real and unreal, likely resulting in malicious events. By leveraging newly emerged deepfake technologies, deepfake researchers have been making a great upending to create deepfake artworks (deeparts), which are further closing the gap between reality and fantasy. To address potentially appeared ethics questions, this paper establishes a deepart detection database (DDDB) that consists of a set of high-quality conventional art images (conarts) and five sets of deepart images generated by five state-of-the-art deepfake models. This database enables us to explore once-for-all deepart detection and continual deepart detection. For the two new problems, we suggest four benchmark evaluations and four families of solutions on the constructed DDDB. The comprehensive study demonstrates the effectiveness of the proposed solutions on the established benchmark dataset, which is capable of paving a way to more interesting directions of deepart detection. The constructed benchmark dataset and the source code will be made publicly available.

Viaarxiv icon

Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference

Nov 29, 2022
Yabin Wang, Zhiheng Ma, Zhiwu Huang, Yaowei Wang, Zhou Su, Xiaopeng Hong

Figure 1 for Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference
Figure 2 for Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference
Figure 3 for Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference
Figure 4 for Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference

This paper focuses on the prevalent performance imbalance in the stages of incremental learning. To avoid obvious stage learning bottlenecks, we propose a brand-new stage-isolation based incremental learning framework, which leverages a series of stage-isolated classifiers to perform the learning task of each stage without the interference of others. To be concrete, to aggregate multiple stage classifiers as a uniform one impartially, we first introduce a temperature-controlled energy metric for indicating the confidence score levels of the stage classifiers. We then propose an anchor-based energy self-normalization strategy to ensure the stage classifiers work at the same energy level. Finally, we design a voting-based inference augmentation strategy for robust inference. The proposed method is rehearsal free and can work for almost all continual learning scenarios. We evaluate the proposed method on four large benchmarks. Extensive results demonstrate the superiority of the proposed method in setting up new state-of-the-art overall performance. \emph{Code is available at} \url{https://github.com/iamwangyabin/ESN}.

* This is the accepted version of the Paper & Supp to appear in AAAI 2023. Please cite the final published version. Code is available at https://github.com/iamwangyabin/ESN 
Viaarxiv icon

S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

Jul 26, 2022
Yabin Wang, Zhiwu Huang, Xiaopeng Hong

Figure 1 for S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning
Figure 2 for S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning
Figure 3 for S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning
Figure 4 for S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

State-of-the-art deep neural networks are still struggling to address the catastrophic forgetting problem in continual learning. In this paper, we propose one simple paradigm (named as S-Prompting) and two concrete approaches to highly reduce the forgetting degree in one of the most typical continual learning scenarios, i.e., domain increment learning (DIL). The key idea of the paradigm is to learn prompts independently across domains with pre-trained transformers, avoiding the use of exemplars that commonly appear in conventional methods. This results in a win-win game where the prompting can achieve the best for each domain. The independent prompting across domains only requests one single cross-entropy loss for training and one simple K-NN operation as a domain identifier for inference. The learning paradigm derives an image prompt learning approach and a brand-new language-image prompt learning approach. Owning an excellent scalability (0.03% parameter increase per domain), the best of our approaches achieves a remarkable relative improvement (an average of about 30%) over the best of the state-of-the-art exemplar-free methods for three standard DIL tasks, and even surpasses the best of them relatively by about 6% in average when they use exemplars.

Viaarxiv icon

A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials

May 14, 2022
Chuqiao Li, Zhiwu Huang, Danda Pani Paudel, Yabin Wang, Mohamad Shahbazi, Xiaopeng Hong, Luc Van Gool

Figure 1 for A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials
Figure 2 for A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials
Figure 3 for A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials
Figure 4 for A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials

There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CDDB designs multiple evaluations on the detection over easy, hard, and long sequence of deepfake tasks, with a set of appropriate measures. In addition, we exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, to the continual deepfake detection problem. We evaluate several methods, including the adapted ones, on the proposed CDDB. Within the proposed benchmark, we explore some commonly known essentials of standard continual learning. Our study provides new insights on these essentials in the context of continual deepfake detection. The suggested CDDB is clearly more challenging than the existing benchmarks, which thus offers a suitable evaluation avenue to the future research. Our benchmark dataset and the source code will be made publicly available.

* some typos are corrected 
Viaarxiv icon

Multi-agent Actor-Critic with Time Dynamical Opponent Model

Apr 12, 2022
Yuan Tian, Klaus-Rudolf Kladny, Qin Wang, Zhiwu Huang, Olga Fink

Figure 1 for Multi-agent Actor-Critic with Time Dynamical Opponent Model
Figure 2 for Multi-agent Actor-Critic with Time Dynamical Opponent Model
Figure 3 for Multi-agent Actor-Critic with Time Dynamical Opponent Model
Figure 4 for Multi-agent Actor-Critic with Time Dynamical Opponent Model

In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel \textit{Time Dynamical Opponent Model} (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose \textit{Multi-Agent Actor-Critic with Time Dynamical Opponent Model} (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed experiments in cooperative and \textbf{especially} in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence.

Viaarxiv icon

MSNet: A Deep Multi-scale Submanifold Network for Visual Classification

Jan 29, 2022
Ziheng Chen, Xiao-Jun Wu, Tianyang Xu, Rui Wang, Zhiwu Huang, Josef Kittler

Figure 1 for MSNet: A Deep Multi-scale Submanifold Network for Visual Classification
Figure 2 for MSNet: A Deep Multi-scale Submanifold Network for Visual Classification
Figure 3 for MSNet: A Deep Multi-scale Submanifold Network for Visual Classification
Figure 4 for MSNet: A Deep Multi-scale Submanifold Network for Visual Classification

The Symmetric Positive Definite (SPD) matrix has received wide attention as a tool for visual data representation in computer vision. Although there are many different attempts to develop effective deep architectures for data processing on the Riemannian manifold of SPD matrices, a very few solutions explicitly mine the local geometrical information in deep SPD feature representations. While CNNs have demonstrated the potential of hierarchical local pattern extraction even for SPD represented data, we argue that it is of utmost importance to ensure the preservation of local geometric information in the SPD networks. Accordingly, in this work we propose an SPD network designed with this objective in mind. In particular, we propose an architecture, referred to as MSNet, which fuses geometrical multi-scale information. We first analyse the convolution operator commonly used for mapping the local information in Euclidean deep networks from the perspective of a higher level of abstraction afforded by the Category Theory. Based on this analysis, we postulate a submanifold selection principle to guide the design of our MSNet. In particular, we use it to design a submanifold fusion block to take advantage of the rich local geometry encoded in the network layers. The experiments involving multiple visual tasks show that our algorithm outperforms most Riemannian SOTA competitors.

Viaarxiv icon

Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo

Oct 11, 2021
Francesco Sarno, Suryansh Kumar, Berk Kaya, Zhiwu Huang, Vittorio Ferrari, Luc Van Gool

Figure 1 for Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo
Figure 2 for Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo
Figure 3 for Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo
Figure 4 for Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo

We present an automated machine learning approach for uncalibrated photometric stereo (PS). Our work aims at discovering lightweight and computationally efficient PS neural networks with excellent surface normal accuracy. Unlike previous uncalibrated deep PS networks, which are handcrafted and carefully tuned, we leverage differentiable neural architecture search (NAS) strategy to find uncalibrated PS architecture automatically. We begin by defining a discrete search space for a light calibration network and a normal estimation network, respectively. We then perform a continuous relaxation of this search space and present a gradient-based optimization strategy to find an efficient light calibration and normal estimation network. Directly applying the NAS methodology to uncalibrated PS is not straightforward as certain task-specific constraints must be satisfied, which we impose explicitly. Moreover, we search for and train the two networks separately to account for the Generalized Bas-Relief (GBR) ambiguity. Extensive experiments on the DiLiGenT dataset show that the automatically searched neural architectures performance compares favorably with the state-of-the-art uncalibrated PS methods while having a lower memory footprint.

* Accepted for publication at IEEE/CVF, WACV 2022. (11 pages) 
Viaarxiv icon