Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Defocus Blur Detection via Salient Region Detection Prior

Nov 19, 2020
Ming Qian, Min Xia, Chunyi Sun, Zhiwei Wang, Liguo Weng

Defocus blur always occurred in photos when people take photos by Digital Single Lens Reflex Camera(DSLR), giving salient region and aesthetic pleasure. Defocus blur Detection aims to separate the out-of-focus and depth-of-field areas in photos, which is an important work in computer vision. Current works for defocus blur detection mainly focus on the designing of networks, the optimizing of the loss function, and the application of multi-stream strategy, meanwhile, these works do not pay attention to the shortage of training data. In this work, to address the above data-shortage problem, we turn to rethink the relationship between two tasks: defocus blur detection and salient region detection. In an image with bokeh effect, it is obvious that the salient region and the depth-of-field area overlap in most cases. So we first train our network on the salient region detection tasks, then transfer the pre-trained model to the defocus blur detection tasks. Besides, we propose a novel network for defocus blur detection. Experiments show that our transfer strategy works well on many current models, and demonstrate the superiority of our network.


Multi-Density Sketch-to-Image Translation Network

Jun 18, 2020
Jialu Huang, Jing Liao, Zhifeng Tan, Sam Kwong

Sketch-to-image (S2I) translation plays an important role in image synthesis and manipulation tasks, such as photo editing and colorization. Some specific S2I translation including sketch-to-photo and sketch-to-painting can be used as powerful tools in the art design industry. However, previous methods only support S2I translation with a single level of density, which gives less flexibility to users for controlling the input sketches. In this work, we propose the first multi-level density sketch-to-image translation framework, which allows the input sketch to cover a wide range from rough object outlines to micro structures. Moreover, to tackle the problem of noncontinuous representation of multi-level density input sketches, we project the density level into a continuous latent space, which can then be linearly controlled by a parameter. This allows users to conveniently control the densities of input sketches and generation of images. Moreover, our method has been successfully verified on various datasets for different applications including face editing, multi-modal sketch-to-photo translation, and anime colorization, providing coarse-to-fine levels of controls to these applications.

* 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works 

The MegaFace Benchmark: 1 Million Faces for Recognition at Scale

Dec 02, 2015
Ira Kemelmacher-Shlizerman, Steve Seitz, Daniel Miller, Evan Brossard

Recent face recognition experiments on a major benchmark LFW show stunning performance--a number of algorithms achieve near to perfect score, surpassing human recognition rates. In this paper, we advocate evaluations at the million scale (LFW includes only 13K photos of 5K people). To this end, we have assembled the MegaFace dataset and created the first MegaFace challenge. Our dataset includes One Million photos that capture more than 690K different individuals. The challenge evaluates performance of algorithms with increasing numbers of distractors (going from 10 to 1M) in the gallery set. We present both identification and verification performance, evaluate performance with respect to pose and a person's age, and compare as a function of training data size (number of photos and people). We report results of state of the art and baseline algorithms. Our key observations are that testing at the million scale reveals big performance differences (of algorithms that perform similarly well on smaller scale) and that age invariant recognition as well as pose are still challenging for most. The MegaFace dataset, baseline code, and evaluation scripts, are all publicly released for further experimentations at:


A Pseudo Multi-Exposure Fusion Method Using Single Image

Aug 01, 2018
Yuma Kinoshita, Sayaka Shiota, Hitoshi Kiya

This paper proposes a novel pseudo multi-exposure image fusion method based on a single image. Multi-exposure image fusion is used to produce images without saturation regions, by using photos with different exposures. However, it is difficult to take photos suited for the multi-exposure image fusion when we take a photo of dynamic scenes or record a video. In addition, the multi-exposure image fusion cannot be applied to existing images with a single exposure or videos. The proposed method enables us to produce pseudo multi-exposure images from a single image. To produce multi-exposure images, the proposed method utilizes the relationship between the exposure values and pixel values, which is obtained by assuming that a digital camera has a linear response function. Moreover, it is shown that the use of a local contrast enhancement method allows us to produce pseudo multi-exposure images with higher quality. Most of conventional multi-exposure image fusion methods are also applicable to the proposed multi-exposure images. Experimental results show the effectiveness of the proposed method by comparing the proposed one with conventional ones.

* To appear in IEICE Trans. Fundamentals, vol.E101-A, no.11, November 2018 

Neural Emotion Director: Speech-preserving semantic control of facial expressions in "in-the-wild" videos

Dec 01, 2021
Foivos Paraperas Papantoniou, Panagiotis P. Filntisis, Petros Maragos, Anastasios Roussos

In this paper, we introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos. The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions. It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics. Finally, the altered facial expressions are used to photo-realistically manipulate the facial region in the input scene based on an especially-designed neural face renderer. To the best of our knowledge, our method is the first to be capable of controlling the actor's facial expressions by even using as a sole input the semantic labels of the manipulated emotions, while at the same time preserving the speech-related lip movements. We conduct extensive qualitative and quantitative evaluations and comparisons, which demonstrate the effectiveness of our approach and the especially promising results that we obtain. Our method opens a plethora of new possibilities for useful applications of neural rendering technologies, ranging from movie post-production and video games to photo-realistic affective avatars.


Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision

Sep 09, 2020
Yuxiang Wei, Ming Liu, Haolin Wang, Ruifeng Zhu, Guosheng Hu, Wangmeng Zuo

Despite recent advances in deep learning-based face frontalization methods, photo-realistic and illumination preserving frontal face synthesis is still challenging due to large pose and illumination discrepancy during training. We propose a novel Flow-based Feature Warping Model (FFWM) which can learn to synthesize photo-realistic and illumination preserving frontal images with illumination inconsistent supervision. Specifically, an Illumination Preserving Module (IPM) is proposed to learn illumination preserving image synthesis from illumination inconsistent image pairs. IPM includes two pathways which collaborate to ensure the synthesized frontal images are illumination preserving and with fine details. Moreover, a Warp Attention Module (WAM) is introduced to reduce the pose discrepancy in the feature level, and hence to synthesize frontal images more effectively and preserve more details of profile images. The attention mechanism in WAM helps reduce the artifacts caused by the displacements between the profile and the frontal images. Quantitative and qualitative experimental results show that our FFWM can synthesize photo-realistic and illumination preserving frontal images and performs favorably against the state-of-the-art results.

* ECCV 2020. Code is available at: 

Rendering Natural Camera Bokeh Effect with Deep Learning

Jun 10, 2020
Andrey Ignatov, Jagruti Patel, Radu Timofte

Bokeh is an important artistic effect used to highlight the main object of interest on the photo by blurring all out-of-focus areas. While DSLR and system camera lenses can render this effect naturally, mobile cameras are unable to produce shallow depth-of-field photos due to a very small aperture diameter of their optics. Unlike the current solutions simulating bokeh by applying Gaussian blur to image background, in this paper we propose to learn a realistic shallow focus technique directly from the photos produced by DSLR cameras. For this, we present a large-scale bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR with 50mm f/1.8 lenses. We use these images to train a deep learning model to reproduce a natural bokeh effect based on a single narrow-aperture image. The experimental results show that the proposed approach is able to render a plausible non-uniform bokeh even in case of complex input data with multiple objects. The dataset, pre-trained models and codes used in this paper are available on the project website.


Automatic Image Stylization Using Deep Fully Convolutional Networks

Nov 27, 2018
Feida Zhu, Yizhou Yu

Color and tone stylization strives to enhance unique themes with artistic color and tone adjustments. It has a broad range of applications from professional image postprocessing to photo sharing over social networks. Mainstream photo enhancement softwares provide users with predefined styles, which are often hand-crafted through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of image contents and the resulting global color transform limits the range of artistic styles it can represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various semantic regions. Such an ability enables a broader range of visual styles. In this paper, we propose a novel deep learning architecture for automatic image stylization, which learns local enhancement styles from image pairs. Our deep learning architecture is an end-to-end deep fully convolutional network performing semantics-aware feature extraction as well as automatic image adjustment prediction. Image stylization can be efficiently accomplished with a single forward pass through our deep network. Experiments on existing datasets for image stylization demonstrate the effectiveness of our deep learning architecture.


Deformable Neural Radiance Fields

Nov 26, 2020
Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Steven M. Seitz, Ricardo Martin-Brualla

We present the first method capable of photorealistically reconstructing a non-rigidly deforming scene using photos/videos captured casually from mobile phones. Our approach -- D-NeRF -- augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. By adapting principles from geometry processing and physical simulation to NeRF-like models, we propose an elastic regularization of the deformation field that further improves robustness. We show that D-NeRF can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub "nerfies." We evaluate our method by collecting data using a rig with two mobile phones that take time-synchronized photos, yielding train/validation images of the same pose at different viewpoints. We show that our method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity.

* Project page with videos: