Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vladimir Pavlovic

Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement

Sep 06, 2019
Minyoung Kim, Yuting Wang, Pritish Sahu, Vladimir Pavlovic

Figure 1 for Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement

Figure 2 for Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement

Figure 3 for Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement

Figure 4 for Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement

We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifying disentangled factors of variability in data. While many recent attempts at factor disentanglement have focused on sophisticated learning objectives within the VAE framework, their choice of a standard normal as the latent factor prior is both suboptimal and detrimental to performance. Our key observation is that the disentangled latent variables responsible for major sources of variability, the relevant factors, can be more appropriately modeled using long-tail distributions. The typical Gaussian priors are, on the other hand, better suited for modeling of nuisance factors. Motivated by this, we extend the VAE to a hierarchical Bayesian model by introducing hyper-priors on the variances of Gaussian latent priors, mimicking an infinite mixture, while maintaining tractable learning and inference of the traditional VAEs. This analysis signifies the importance of partitioning and treating in a different manner the latent dimensions corresponding to relevant factors and nuisances. Our proposed models, dubbed Bayes-Factor-VAEs, are shown to outperform existing methods both quantitatively and qualitatively in terms of latent disentanglement across several challenging benchmark tasks.

* International Conference on Computer Vision (ICCV) 2019

Via

Access Paper or Ask Questions

Efficient Deep Gaussian Process Models for Variable-Sized Input

May 16, 2019
Issam H. Laradji, Mark Schmidt, Vladimir Pavlovic, Minyoung Kim

Figure 1 for Efficient Deep Gaussian Process Models for Variable-Sized Input

Figure 2 for Efficient Deep Gaussian Process Models for Variable-Sized Input

Figure 3 for Efficient Deep Gaussian Process Models for Variable-Sized Input

Figure 4 for Efficient Deep Gaussian Process Models for Variable-Sized Input

Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP's latent vectors and infer the posterior of DRF's internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://github.com/IssamLaradji/GP_DRF.

* Accepted in IJCNN 2019

Via

Access Paper or Ask Questions

The Art of Food: Meal Image Synthesis from Ingredients

May 09, 2019
Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

Figure 1 for The Art of Food: Meal Image Synthesis from Ingredients

Figure 2 for The Art of Food: Meal Image Synthesis from Ingredients

Figure 3 for The Art of Food: Meal Image Synthesis from Ingredients

Figure 4 for The Art of Food: Meal Image Synthesis from Ingredients

In this work we propose a new computational framework, based on generative deep models, for synthesis of photo-realistic food meal images from textual descriptions of its ingredients. Previous works on synthesis of images from text typically rely on pre-trained text models to extract text features, followed by a generative neural networks (GANs) aimed to generate realistic images conditioned on the text features. These works mainly focus on generating spatially compact and well-defined categories of objects, such as birds or flowers. In contrast, meal images are significantly more complex, consisting of multiple ingredients whose appearance and spatial qualities are further modified by cooking methods. We propose a method that first builds an attention-based ingredients-image association model, which is then used to condition a generative neural network tasked with synthesizing meal images. Furthermore, a cycle-consistent constraint is added to further improve image quality and control appearance. Extensive experiments show our model is able to generate meal image corresponding to the ingredients, which could be used to augment existing dataset for solving other computational food analysis problems.

* 12 pages, 6 figures, 2 tables, under review as a conference paper at BMVC 2019

Via

Access Paper or Ask Questions

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

May 06, 2019
Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

Figure 1 for Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

Figure 2 for Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

Figure 3 for Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

Figure 4 for Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Specifically, we introduce a statistical 3D morphable model that flexibly describes the distribution of points on the surface of the face model, with an efficient switchable online adaptation that gradually captures the identity of the tracked subject and rapidly constructs a suitable face model when the subject changes. Moreover, unlike prior art that employed ICP-based facial pose estimation, to improve robustness to occlusions, we propose a ray visibility constraint that regularizes the pose based on the face model's visibility with respect to the input point cloud. Ablation studies and experimental results on Biwi and ICT-3DHP datasets demonstrate that the proposed framework is effective and outperforms completing state-of-the-art depth-based methods.

Via

Access Paper or Ask Questions

Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach

Feb 23, 2019
Minyoung Kim, Pritish Sahu, Behnam Gholami, Vladimir Pavlovic

Figure 1 for Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach

Figure 2 for Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach

Figure 3 for Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach

In unsupervised domain adaptation, it is widely known that the target domain error can be provably reduced by having a shared input representation that makes the source and target domains indistinguishable from each other. Very recently it has been studied that not just matching the marginal input distributions, but the alignment of output (class) distributions is also critical. The latter can be achieved by minimizing the maximum discrepancy of predictors (classifiers). In this paper, we adopt this principle, but propose a more systematic and effective way to achieve hypothesis consistency via Gaussian processes (GP). The GP allows us to define/induce a hypothesis space of the classifiers from the posterior distribution of the latent random functions, turning the learning into a simple large-margin posterior separation problem, far easier to solve than previous approaches based on adversarial minimax optimization. We formulate a learning objective that effectively pushes the posterior to minimize the maximum discrepancy. This is further shown to be equivalent to maximizing margins and minimizing uncertainty of the class predictions in the target domain, a well-established principle in classical (semi-)supervised learning. Empirical results demonstrate that our approach is comparable or superior to the existing methods on several benchmark domain adaptation datasets.

Via

Access Paper or Ask Questions

Relevance Factor VAE: Learning and Identifying Disentangled Factors

Feb 05, 2019
Minyoung Kim, Yuting Wang, Pritish Sahu, Vladimir Pavlovic

Figure 1 for Relevance Factor VAE: Learning and Identifying Disentangled Factors

Figure 2 for Relevance Factor VAE: Learning and Identifying Disentangled Factors

Figure 3 for Relevance Factor VAE: Learning and Identifying Disentangled Factors

Figure 4 for Relevance Factor VAE: Learning and Identifying Disentangled Factors

We propose a novel VAE-based deep auto-encoder model that can learn disentangled latent representations in a fully unsupervised manner, endowed with the ability to identify all meaningful sources of variation and their cardinality. Our model, dubbed Relevance-Factor-VAE, leverages the total correlation (TC) in the latent space to achieve the disentanglement goal, but also addresses the key issue of existing approaches which cannot distinguish between meaningful and nuisance factors of latent variation, often the source of considerable degradation in disentanglement performance. We tackle this issue by introducing the so-called relevance indicator variables that can be automatically learned from data, together with the VAE parameters. Our model effectively focuses the TC loss onto the relevant factors only by tolerating large prior KL divergences, a desideratum justified by our semi-parametric theoretical analysis. Using a suite of disentanglement metrics, including a newly proposed one, as well as qualitative evidence, we demonstrate that our model outperforms existing methods across several challenging benchmark datasets.

Via

Access Paper or Ask Questions

Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

Oct 26, 2018
Behnam Gholami, Pritish Sahu, Ognjen Rudovic, Konstantinos Bousmalis, Vladimir Pavlovic

Figure 1 for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

Figure 2 for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

Figure 3 for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

Figure 4 for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

Unsupervised domain adaptation (uDA) models focus on pairwise adaptation settings where there is a single, labeled, source and a single target domain. However, in many real-world settings one seeks to adapt to multiple, but somewhat similar, target domains. Applying pairwise adaptation approaches to this setting may be suboptimal, as they fail to leverage shared information among multiple domains. In this work we propose an information theoretic approach for domain adaptation in the novel context of multiple target domains with unlabeled instances and one source domain with labeled instances. Our model aims to find a shared latent space common to all domains, while simultaneously accounting for the remaining private, domain-specific factors. Disentanglement of shared and private information is accomplished using a unified information-theoretic approach, which also serves to establish a stronger link between the latent representations and the observed data. The resulting model, accompanied by an efficient optimization algorithm, allows simultaneous adaptation from a single source to multiple target domains. We test our approach on three challenging publicly-available datasets, showing that it outperforms several popular domain adaptation methods.

* 19 pages, 5 Figures, 5 Tables

Via

Access Paper or Ask Questions

Sketch-Based Face Editing in Videos Using Identity Deformation Transfer

May 31, 2018
Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas

Figure 1 for Sketch-Based Face Editing in Videos Using Identity Deformation Transfer

Figure 2 for Sketch-Based Face Editing in Videos Using Identity Deformation Transfer

Figure 3 for Sketch-Based Face Editing in Videos Using Identity Deformation Transfer

Figure 4 for Sketch-Based Face Editing in Videos Using Identity Deformation Transfer

We address the problem of using hand-drawn sketches to edit the facial identity, such as enlarging the shape or modifying the position of eyes or mouth, in the entire video. This task is formulated as a 3D face model reconstruction and deformation problem. We first introduce a two-stage real-time 3D face model fitting schema to recover the facial identity and expressions from the video. User's editing intention is recognized from input sketches as a set of facial modifications. Then a novel identity deformation algorithm is proposed to transfer these facial deformations from 2D space to the 3D facial identity directly, while preserving the facial expressions. After an optional stage for further refining the 3D face model, these changes are propagated to the whole video with the modified identity. Both the user study and experimental results demonstrate that our sketching framework can help users effectively edit facial identities in videos, while high consistency and fidelity are ensured at the same time.

Via

Access Paper or Ask Questions

Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

Mar 28, 2018
Hai X. Pham, Yuting Wang, Vladimir Pavlovic

Figure 1 for Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

Figure 2 for Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

Figure 3 for Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

Figure 4 for Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

This paper presents Generative Adversarial Talking Head (GATH), a novel deep generative neural network that enables fully automatic facial expression synthesis of an arbitrary portrait with continuous action unit (AU) coefficients. Specifically, our model directly manipulates image pixels to make the unseen subject in the still photo express various emotions controlled by values of facial AU coefficients, while maintaining her personal characteristics, such as facial geometry, skin color and hair style, as well as the original surrounding background. In contrast to prior work, GATH is purely data-driven and it requires neither a statistical face model nor image processing tricks to enact facial deformations. Additionally, our model is trained from unpaired data, where the input image, with its auxiliary identity label taken from abundance of still photos in the wild, and the target frame are from different persons. In order to effectively learn such model, we propose a novel weakly supervised adversarial learning framework that consists of a generator, a discriminator, a classifier and an action unit estimator. Our work gives rise to template-and-target-free expression editing, where still faces can be effortlessly animated with arbitrary AU coefficients provided by the user.

* Fix typos, add youtube link of supplementary video

Via

Access Paper or Ask Questions

End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

Dec 07, 2017
Hai X. Pham, Yuting Wang, Vladimir Pavlovic

Figure 1 for End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

Figure 2 for End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

Figure 3 for End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

Figure 4 for End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and head rotations to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual information and affective states within the speech. Hence, our model not only activates appropriate facial action units at inference to depict different utterance generating actions, in the form of lip movements, but also, without any assumption, automatically estimates emotional intensity of the speaker and reproduces her ever-changing affective states by adjusting strength of facial unit activations. For example, in a happy speech, the mouth opens wider than normal, while other facial units are relaxed; or in a surprised state, both eyebrows raise higher. Experiments on a diverse audiovisual corpus of different actors across a wide range of emotional states show interesting and promising results of our approach. Being speaker-independent, our generalized model is readily applicable to various tasks in human-machine interaction and animation.

Via

Access Paper or Ask Questions