Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Pixel-based Facial Expression Synthesis

Oct 27, 2020
Arbish Akram, Nazar Khan

Facial expression synthesis has achieved remarkable advances with the advent of Generative Adversarial Networks (GANs). However, GAN-based approaches mostly generate photo-realistic results as long as the testing data distribution is close to the training data distribution. The quality of GAN results significantly degrades when testing images are from a slightly different distribution. Moreover, recent work has shown that facial expressions can be synthesized by changing localized face regions. In this work, we propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel. The proposed method achieves good generalization capability by leveraging only a few hundred training images. Experimental results demonstrate that the proposed method performs comparably well against state-of-the-art GANs on in-dataset images and significantly better on out-of-dataset images. In addition, the proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.

* ICPR 2020, 7 pages, 5 figures 
Access Paper or Ask Questions

Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos

Oct 22, 2020
Zhengxia Zou

This paper proposes a vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles. Different from previous sky editing methods that either focus on static photos or require inertial measurement units integrated in smartphones on shooting videos, our method is purely vision-based, without any requirements on the capturing devices, and can be well applied to either online or offline processing scenarios. Our method runs in real-time and is free of user interactions. We decompose this artistic creation process into a couple of proxy tasks including sky matting, motion estimation, and image blending. Experiments are conducted on videos diversely captured in the wild by handheld smartphones and dash cameras, and show high fidelity and good generalization of our method in both visual quality and lighting/motion dynamics. Our code and animated results are available at \url{}.

* project website: 
Access Paper or Ask Questions

CommuNety: A Deep Learning System for the Prediction of Cohesive Social Communities

Jul 29, 2020
Syed Afaq Ali Shah, Weifeng Deng, Jianxin Li, Muhammad Aamir Cheema, Abdul Bais

Effective mining of social media, which consists of a large number of users is a challenging task. Traditional approaches rely on the analysis of text data related to users to accomplish this task. However, text data lacks significant information about the social users and their associated groups. In this paper, we propose CommuNety, a deep learning system for the prediction of cohesive social networks using images. The proposed deep learning model consists of hierarchical CNN architecture to learn descriptive features related to each cohesive network. The paper also proposes a novel Face Co-occurrence Frequency algorithm to quantify existence of people in images, and a novel photo ranking method to analyze the strength of relationship between different individuals in a predicted social network. We extensively evaluate the proposed technique on PIPA dataset and compare with state-of-the-art methods. Our experimental results demonstrate the superior performance of the proposed technique for the prediction of relationship between different individuals and the cohesiveness of communities.

Access Paper or Ask Questions

Transferable Active Grasping and Real Embodied Dataset

Apr 28, 2020
Xiangyu Chen, Zelin Ye, Jiankai Sun, Yuda Fan, Fang Hu, Chenxi Wang, Cewu Lu

Grasping in cluttered scenes is challenging for robot vision systems, as detection accuracy can be hindered by partial occlusion of objects. We adopt a reinforcement learning (RL) framework and 3D vision architectures to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras. To overcome the disadvantages of photo-realistic environment simulation, we propose a large-scale dataset called Real Embodied Dataset (RED), which includes full-viewpoint real samples on the upper hemisphere with amodal annotation and enables a simulator that has real visual feedback. Based on this dataset, a practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes. In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior. The grasping pipeline and its possible variants are evaluated with extensive experiments both in simulation and on a real-world UR-5 robotic arm.

Access Paper or Ask Questions

Neural Hair Rendering

Apr 28, 2020
Menglei Chai, Jian Ren, Sergey Tulyakov

In this paper, we propose a generic neural-based hair rendering pipeline that can synthesize photo-realistic images from virtual 3D hair models. Unlike existing supervised translation methods that require model-level similarity to preserve consistent structure representation for both real images and fake renderings, our method adopts an unsupervised solution to work on arbitrary hair models. The key component of our method is a shared latent space to encode appearance-invariant structure information of both domains, which generates realistic renderings conditioned by extra appearance inputs. This is achieved by domain-specific pre-disentangled structure representation, partially shared domain encoder layers, and a structure discriminator. We also propose a simple yet effective temporal conditioning method to enforce consistency for video sequence generation. We demonstrate the superiority of our method by testing it on large amount of portraits, and comparing with alternative baselines and state-of-the-art unsupervised image translation methods.

Access Paper or Ask Questions

Close-Proximity Underwater Terrain Mapping Using Learning-based Coarse Range Estimation

Jan 02, 2020
Bilal Arain, Feras Dayoub, Paul Rigby, Matthew Dunbabin

This paper presents a novel approach to underwater terrain mapping for Autonomous Underwater Vehicles (AUVs) operating in close proximity to complex 3D environments. The approach leverages a coarse learning-based scene range estimator from monocular images, which can filter transient objects such as fish and lighting aberrations. The proposed methodology then creates a probabilistic elevation map of the terrain using a learning-based scene range estimator as a sensor. The approach considers uncertainty in estimated scene range and robot pose as the AUV moves through the environment. The resulting elevation map can be used for reactive path planning and obstacle avoidance to allow robotic systems to follow the underwater terrain as closely as possible. The performance of our approach is evaluated in simulation by comparing the reconstructed terrain to ground truth reference maps in an photo-realistic underwater environment. The method is also demonstrated using field data collected within a coral reef environment by an AUV.

* 8 pages, 13 figures, submitted to IEEE RA-L 
Access Paper or Ask Questions

DeepSFM: Structure From Motion Via Deep Bundle Adjustment

Dec 20, 2019
Xingkui Wei, Yinda Zhang, Zhuwen Li, Yanwei Fu, Xiangyang Xue

Structure from motion (SfM) is an essential computer vision problem which has not been well handled by deep learning. One of the promising trends is to apply explicit structural constraint, e.g. 3D cost volume, into the network.In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment (BA), which consists of two cost volume based architectures for depth and pose estimation respectively, iteratively running to improve both.In each cost volume, we encode not only photo-metric consistency across multiple input images, but also geometric consistency to ensure that depths from multiple views agree with each other.The explicit constraints on both depth (structure) and pose (motion), when combined with the learning components, bring the merit from both traditional BA and emerging deep learning technology.Extensive experiments on various datasets show that our model achieves the state-of-the-art performance on both depth and pose estimation with superior robustness against less number of inputs and the noise in initialization.

Access Paper or Ask Questions

LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport

Sep 05, 2019
Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang

Estimating mutual information is an important machine learning and statistics problem. To estimate the mutual information from data, a common practice is preparing a set of paired samples. However, in some cases, it is difficult to obtain a large number of data pairs. To address this problem, we propose squared-loss mutual information (SMI) estimation using a small number of paired samples and the available unpaired ones. We first represent SMI through the density ratio function, where the expectation is approximated by the samples from marginals and its assignment parameters. The objective is formulated using the optimal transport problem and quadratic programming. Then, we introduce the least-square mutual information-Sinkhorn algorithm (LSMI-Sinkhorn) for efficient optimization. Through experiments, we first demonstrate that the proposed method can estimate the SMI without a large number of paired samples. We also evaluate and show the effectiveness of the proposed LSMI-Sinkhorn on various types of machine learning problems such as image matching and photo album summarization.

* 14 pages 
Access Paper or Ask Questions

Leveraging Uncertainty in Deep Learning for Selective Classification

May 23, 2019
Mehmet Yigit Yildirim, Mert Ozer, Hasan Davulcu

The wide and rapid adoption of deep learning by practitioners brought unintended consequences in many situations such as in the infamous case of Google Photos' racist image recognition algorithm; thus, necessitated the utilization of the quantified uncertainty for each prediction. There have been recent efforts towards quantifying uncertainty in conventional deep learning methods (e.g., dropout as Bayesian approximation); however, their optimal use in decision making is often overlooked and understudied. In this study, we propose a mixed-integer programming framework for classification with reject option (also known as selective classification), that investigates and combines model uncertainty and predictive mean to identify optimal classification and rejection regions. Our results indicate superior performance of our framework both in non-rejected accuracy and rejection quality on several publicly available datasets. Moreover, we extend our framework to cost-sensitive settings and show that our approach outperforms industry standard methods significantly for online fraud management in real-world settings.

Access Paper or Ask Questions

Im2Pencil: Controllable Pencil Illustration from Photographs

Mar 20, 2019
Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang

We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style. This is a challenging task due to multiple stroke types (e.g., outline and shading), structural complexity of pencil shading (e.g., hatching), and the lack of aligned training data pairs. To address these challenges, we develop a two-branch model that learns separate filters for generating sketchy outlines and tonal shading from a collection of pencil drawings. We create training data pairs by extracting clean outlines and tonal illustrations from original pencil drawings using image filtering techniques, and we manually label the drawing styles. In addition, our model creates different pencil styles (e.g., line sketchiness and shading style) in a user-controllable manner. Experimental results on different types of pencil drawings show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and user evaluations.

* Accepted by CVPR 2019 
Access Paper or Ask Questions