Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoming Liu

Reverse Engineering of Imperceptible Adversarial Image Perturbations

Apr 01, 2022

Yifan Gong, Yuguang Yao, Yize Li, Yimeng Zhang, Xiaoming Liu, Xue Lin, Sijia Liu

Figure 1 for Reverse Engineering of Imperceptible Adversarial Image Perturbations

Figure 2 for Reverse Engineering of Imperceptible Adversarial Image Perturbations

Figure 3 for Reverse Engineering of Imperceptible Adversarial Image Perturbations

Figure 4 for Reverse Engineering of Imperceptible Adversarial Image Perturbations

Abstract:It has been well recognized that neural network based image classifiers are easily fooled by images with tiny perturbations crafted by an adversary. There has been a vast volume of research to generate and defend such adversarial attacks. However, the following problem is left unexplored: How to reverse-engineer adversarial perturbations from an adversarial image? This leads to a new adversarial learning paradigm--Reverse Engineering of Deceptions (RED). If successful, RED allows us to estimate adversarial perturbations and recover the original images. However, carefully crafted, tiny adversarial perturbations are difficult to recover by optimizing a unilateral RED objective. For example, the pure image denoising method may overfit to minimizing the reconstruction error but hardly preserve the classification properties of the true adversarial perturbations. To tackle this challenge, we formalize the RED problem and identify a set of principles crucial to the RED approach design. Particularly, we find that prediction alignment and proper data augmentation (in terms of spatial transformations) are two criteria to achieve a generalizable RED approach. By integrating these RED principles with image denoising, we propose a new Class-Discriminative Denoising based RED framework, termed CDD-RED. Extensive experiments demonstrate the effectiveness of CDD-RED under different evaluation metrics (ranging from the pixel-level, prediction-level to the attribution-level alignment) and a variety of attack generation methods (e.g., FGSM, PGD, CW, AutoAttack, and adaptive attacks).

Via

Access Paper or Ask Questions

Proactive Image Manipulation Detection

Mar 31, 2022

Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, Xiaoming Liu

Figure 1 for Proactive Image Manipulation Detection

Figure 2 for Proactive Image Manipulation Detection

Figure 3 for Proactive Image Manipulation Detection

Figure 4 for Proactive Image Manipulation Detection

Abstract:Image manipulation detection algorithms are often trained to discriminate between images manipulated with particular Generative Models (GMs) and genuine/real images, yet generalize poorly to images manipulated with GMs unseen in the training. Conventional detection algorithms receive an input image passively. By contrast, we propose a proactive scheme to image manipulation detection. Our key enabling technique is to estimate a set of templates which when added onto the real image would lead to more accurate manipulation detection. That is, a template protected real image, and its manipulated version, is better discriminated compared to the original real image vs. its manipulated one. These templates are estimated using certain constraints based on the desired properties of templates. For image manipulation detection, our proposed approach outperforms the prior work by an average precision of 16% for CycleGAN and 32% for GauGAN. Our approach is generalizable to a variety of GMs showing an improvement over prior work by an average precision of 10% averaged across 12 GMs. Our code is available at https://www.github.com/vishal3477/proactive_IMD.

* Published at CVPR 2022

Via

Access Paper or Ask Questions

Face Relighting with Geometrically Consistent Shadows

Mar 30, 2022

Andrew Hou, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu

Figure 1 for Face Relighting with Geometrically Consistent Shadows

Figure 2 for Face Relighting with Geometrically Consistent Shadows

Figure 3 for Face Relighting with Geometrically Consistent Shadows

Figure 4 for Face Relighting with Geometrically Consistent Shadows

Abstract:Most face relighting methods are able to handle diffuse shadows, but struggle to handle hard shadows, such as those cast by the nose. Methods that propose techniques for handling hard shadows often do not produce geometrically consistent shadows since they do not directly leverage the estimated face geometry while synthesizing them. We propose a novel differentiable algorithm for synthesizing hard shadows based on ray tracing, which we incorporate into training our face relighting model. Our proposed algorithm directly utilizes the estimated face geometry to synthesize geometrically consistent hard shadows. We demonstrate through quantitative and qualitative experiments on Multi-PIE and FFHQ that our method produces more geometrically consistent shadows than previous face relighting methods while also achieving state-of-the-art face relighting performance under directional lighting. In addition, we demonstrate that our differentiable hard shadow modeling improves the quality of the estimated face geometry over diffuse shading models.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

The State of Aerial Surveillance: A Survey

Jan 13, 2022

Kien Nguyen, Clinton Fookes, Sridha Sridharan, Yingli Tian, Feng Liu, Xiaoming Liu, Arun Ross

Figure 1 for The State of Aerial Surveillance: A Survey

Figure 2 for The State of Aerial Surveillance: A Survey

Figure 3 for The State of Aerial Surveillance: A Survey

Figure 4 for The State of Aerial Surveillance: A Survey

Abstract:The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs and other airborne platforms. The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed. More specifically, for each of these four tasks, we first discuss unique challenges in performing these tasks in an aerial setting compared to a ground-based setting. We then review and analyze the aerial datasets publicly available for each task, and delve deep into the approaches in the aerial literature and investigate how they presently address the aerial challenges. We conclude the paper with discussion on the missing gaps and open research questions to inform future research avenues.

Via

Access Paper or Ask Questions

Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

Nov 04, 2021

Feng Liu, Xiaoming Liu

Figure 1 for Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

Figure 2 for Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

Figure 3 for Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

Figure 4 for Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

Abstract:Inferring 3D locations and shapes of multiple objects from a single 2D image is a long-standing objective of computer vision. Most of the existing works either predict one of these 3D properties or focus on solving both for a single object. One fundamental challenge lies in how to learn an effective representation of the image that is well-suited for 3D detection and reconstruction. In this work, we propose to learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator. Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space. Moreover, we devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation, which enables fine detail reconstruction and one order of magnitude faster inference than prior methods. With complementary supervision from both 3D detection and reconstruction, one enables the 3D voxel features to be geometry and context preserving, benefiting both tasks.The effectiveness of our approach is demonstrated through 3D detection and reconstruction in single object and multiple object scenarios.

* NeurIPS 2021

Via

Access Paper or Ask Questions

MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation

Nov 01, 2021

Safa C. Medin, Bernhard Egger, Anoop Cherian, Ye Wang, Joshua B. Tenenbaum, Xiaoming Liu, Tim K. Marks

Figure 1 for MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation

Figure 2 for MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation

Figure 3 for MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation

Figure 4 for MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation

Abstract:Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis. While methods that use style-based GANs can generate strikingly photorealistic face images, it is often difficult to control the characteristics of the generated faces in a meaningful and disentangled way. Prior approaches aim to achieve such semantic control and disentanglement within the latent space of a previously trained GAN. In contrast, we propose a framework that a priori models physical attributes of the face such as 3D shape, albedo, pose, and lighting explicitly, thus providing disentanglement by design. Our method, MOST-GAN, integrates the expressive power and photorealism of style-based GANs with the physical disentanglement and flexibility of nonlinear 3D morphable models, which we couple with a state-of-the-art 2D hair manipulation network. MOST-GAN achieves photorealistic manipulation of portrait images with fully disentangled 3D control over their physical attributes, enabling extreme manipulation of lighting, facial expression, and pose variations up to full profile view.

Via

Access Paper or Ask Questions

Full-Velocity Radar Returns by Radar-Camera Fusion

Aug 24, 2021

Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, Praveen Narayanan

Figure 1 for Full-Velocity Radar Returns by Radar-Camera Fusion

Figure 2 for Full-Velocity Radar Returns by Radar-Camera Fusion

Figure 3 for Full-Velocity Radar Returns by Radar-Camera Fusion

Figure 4 for Full-Velocity Radar Returns by Radar-Camera Fusion

Abstract:A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points. However, the missing tangential velocity component hampers object velocity estimation as well as temporal integration of radar sweeps in dynamic scenes. Recognizing that fusing camera with radar provides complementary information to radar, in this paper we present a closed-form solution for the point-wise, full-velocity estimate of Doppler returns using the corresponding optical flow from camera images. Additionally, we address the association problem between radar returns and camera images with a neural network that is trained to estimate radar-camera correspondences. Experimental results on the nuScenes dataset verify the validity of the method and show significant improvements over the state-of-the-art in velocity estimation and accumulation of radar points.

* International Conference on Computer Vision, 2021

Via

Access Paper or Ask Questions

Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images

Jun 15, 2021

Vishal Asnani, Xi Yin, Tal Hassner, Xiaoming Liu

Figure 1 for Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images

Figure 2 for Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images

Figure 3 for Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images

Figure 4 for Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images

Abstract:State-of-the-art (SOTA) Generative Models (GMs) can synthesize photo-realistic images that are hard for humans to distinguish from genuine photos. We propose to perform reverse engineering of GMs to infer the model hyperparameters from the images generated by these models. We define a novel problem, "model parsing", as estimating GM network architectures and training loss functions by examining their generated images -- a task seemingly impossible for human beings. To tackle this problem, we propose a framework with two components: a Fingerprint Estimation Network (FEN), which estimates a GM fingerprint from a generated image by training with four constraints to encourage the fingerprint to have desired properties, and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints. To evaluate our approach, we collect a fake image dataset with $100$K images generated by $100$ GMs. Extensive experiments show encouraging results in parsing the hyperparameters of the unseen models. Finally, our fingerprint estimation can be leveraged for deepfake detection and image attribution, as we show by reporting SOTA results on both the recent Celeb-DF and image attribution benchmarks.

Via

Access Paper or Ask Questions

Radar-Camera Pixel Depth Association for Depth Completion

Jun 05, 2021

Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, Praveen Narayanan

Figure 1 for Radar-Camera Pixel Depth Association for Depth Completion

Figure 2 for Radar-Camera Pixel Depth Association for Depth Completion

Figure 3 for Radar-Camera Pixel Depth Association for Depth Completion

Figure 4 for Radar-Camera Pixel Depth Association for Depth Completion

Abstract:While radar and video data can be readily fused at the detection level, fusing them at the pixel level is potentially more beneficial. This is also more challenging in part due to the sparsity of radar, but also because automotive radar beams are much wider than a typical pixel combined with a large baseline between camera and radar, which results in poor association between radar pixels and color pixel. A consequence is that depth completion methods designed for LiDAR and video fare poorly for radar and video. Here we propose a radar-to-pixel association stage which learns a mapping from radar returns to pixels. This mapping also serves to densify radar returns. Using this as a first stage, followed by a more traditional depth completion method, we are able to achieve image-guided depth completion with radar and video. We demonstrate performance superior to camera and radar alone on the nuScenes dataset. Our source code is available at https://github.com/longyunf/rc-pda.

* IEEE Conference on Computer Vision and Pattern Recognition, 2021

Via

Access Paper or Ask Questions

Riggable 3D Face Reconstruction via In-Network Optimization

Apr 08, 2021

Ziqian Bai, Zhaopeng Cui, Xiaoming Liu, Ping Tan

Figure 1 for Riggable 3D Face Reconstruction via In-Network Optimization

Figure 2 for Riggable 3D Face Reconstruction via In-Network Optimization

Figure 3 for Riggable 3D Face Reconstruction via In-Network Optimization

Figure 4 for Riggable 3D Face Reconstruction via In-Network Optimization

Abstract:This paper presents a method for riggable 3D face reconstruction from monocular images, which jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations. To achieve this goal, we design an end-to-end trainable network embedded with a differentiable in-network optimization. The network first parameterizes the face rig as a compact latent code with a neural decoder, and then estimates the latent code as well as per-image parameters via a learnable optimization. By estimating a personalized face rig, our method goes beyond static reconstructions and enables downstream applications such as video retargeting. In-network optimization explicitly enforces constraints derived from the first principles, thus introduces additional priors than regression-based methods. Finally, data-driven priors from deep learning are utilized to constrain the ill-posed monocular setting and ease the optimization difficulty. Experiments demonstrate that our method achieves SOTA reconstruction accuracy, reasonable robustness and generalization ability, and supports standard face rig applications.

* CVPR2021. Code: https://github.com/zqbai-jeremy/INORig Camera Ready Paper: https://zqbai-jeremy.github.io/files/INORig.pdf Camera Ready Supp: https://zqbai-jeremy.github.io/files/INORig_supp.pdf

Via

Access Paper or Ask Questions