Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

3D Photography using Context-aware Layered Depth Inpainting

Apr 09, 2020
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, Jia-Bin Huang

We propose a method for converting a single RGB-D input image into a 3D photo - a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines. We validate the effectiveness of our method on a wide range of challenging everyday scenes and show fewer artifacts compared with the state of the arts.

* CVPR 2020. Project page: Code: 

Iconify: Converting Photographs into Icons

Apr 07, 2020
Takuro Karamatsu, Gibran Benitez-Garcia, Keiji Yanai, Seiichi Uchida

In this paper, we tackle a challenging domain conversion task between photo and icon images. Although icons often originate from real object images (i.e., photographs), severe abstractions and simplifications are applied to generate icon images by professional graphic designers. Moreover, there is no one-to-one correspondence between the two domains, for this reason we cannot use it as the ground-truth for learning a direct conversion function. Since generative adversarial networks (GAN) can undertake the problem of domain conversion without any correspondence, we test CycleGAN and UNIT to generate icons from objects segmented from photo images. Our experiments with several image datasets prove that CycleGAN learns sufficient abstraction and simplification ability to generate icon-like images.

* to appear at 2020 Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM'20) 

What Makes Kevin Spacey Look Like Kevin Spacey

Jun 02, 2015
Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, Steve Seitz

We reconstruct a controllable model of a person from a large photo collection that captures his or her {\em persona}, i.e., physical appearance and behavior. The ability to operate on unstructured photo collections enables modeling a huge number of people, including celebrities and other well photographed people without requiring them to be scanned. Moreover, we show the ability to drive or {\em puppeteer} the captured person B using any other video of a different person A. In this scenario, B acts out the role of person A, but retains his/her own personality and character. Our system is based on a novel combination of 3D face reconstruction, tracking, alignment, and multi-texture modeling, applied to the puppeteering problem. We demonstrate convincing results on a large variety of celebrities derived from Internet imagery and video.


Survey on Sparse Coded Features for Content Based Face Image Retrieval

Feb 20, 2014
D. Johnvictor, G. Selvavinayagam

Content based image retrieval, a technique which uses visual contents of image to search images from large scale image databases according to users' interests. This paper provides a comprehensive survey on recent technology used in the area of content based face image retrieval. Nowadays digital devices and photo sharing sites are getting more popularity, large human face photos are available in database. Multiple types of facial features are used to represent discriminality on large scale human facial image database. Searching and mining of facial images are challenging problems and important research issues. Sparse representation on features provides significant improvement in indexing related images to query image.

* International Journal of Computer Trends and Technology (IJCTT) 8(1):30-33, February 2014. ISSN:2231-2803 
* 4 pages,3 figures,1 table, Published with International Journal of Computer Trends and Technology (IJCTT) 

Aesthetic Quality Assessment for Group photograph

Feb 04, 2020
Yaoting Wang, Yongzhen Ke, Kai Wang, Cuijiao Zhang, Fan Qin

Image aesthetic quality assessment has got much attention in recent years, but not many works have been done on a specific genre of photos: Group photograph. In this work, we designed a set of high-level features based on the experience and principles of group photography: Opened-eye, Gaze, Smile, Occluded faces, Face Orientation, Facial blur, Character center. Then we combined them and 83 generic aesthetic features to build two aesthetic assessment models. We also constructed a large dataset of group photographs - GPD- annotated with the aesthetic score. The experimental result shows that our features perform well for categorizing professional photos and snapshots and predicting the distinction of multiple group photographs of diverse human states under the same scene.


MemexQA: Visual Memex Question Answering

Aug 04, 2017
Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection. Towards solving the task, we 1) present the MemexQA dataset, a large, realistic multimodal dataset consisting of real personal photos and crowd-sourced questions/answers, 2) propose MemexNet, a unified, end-to-end trainable network architecture for image, text and video question answering. Experimental results on the MemexQA dataset demonstrate that MemexNet outperforms strong baselines and yields the state-of-the-art on this novel and challenging task. The promising results on TextQA and VideoQA suggest MemexNet's efficacy and scalability across various QA tasks.



Apr 24, 2017
Hamid Izadinia, Qi Shan, Steven M. Seitz

Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement.

* To appear at CVPR 2017 

Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

May 21, 2016
Xingchao Peng, Judy Hoffman, Stella X. Yu, Kate Saenko

We address the difficult problem of distinguishing fine-grained object categories in low resolution images. Wepropose a simple an effective deep learning approach that transfers fine-grained knowledge gained from high resolution training data to the coarse low-resolution test scenario. Such fine-to-coarse knowledge transfer has many real world applications, such as identifying objects in surveillance photos or satellite images where the image resolution at the test time is very low but plenty of high resolution photos of similar objects are available. Our extensive experiments on two standard benchmark datasets containing fine-grained car models and bird species demonstrate that our approach can effectively transfer fine-detail knowledge to coarse-detail imagery.

* 5 pages, accepted by ICIP 2016 

AQPDBJUT Dataset: Picture-Based PM2.5 Monitoring in the Campus of BJUT

Mar 19, 2020
Yonghui Zhang, Ke Gu, Zhifang Xia, Junfei Qiao

Ensuring the students in good physical levels is imperative for their future health. In recent years, the continually growing concentration of Particulate Matter (PM) has done increasingly serious harm to student health. Hence, it is highly required to prevent and control PM concentrations in the campus. As the source of PM prevention and control, developing a good model for PM monitoring is extremely urgent and has posed a big challenge. It has been found in prior works that photo-based methods are available for PM monitoring. To verify the effectiveness of existing PM monitoring methods in the campus, we establish a new dataset which includes 1,500 photos collected in the Beijing University of Technology. Experiments show that stated-of-the-art methods are far from ideal for PM2.5 monitoring in the campus.


WarpGAN: Automatic Caricature Generation

Nov 28, 2018
Yichun Shi, Debayan Deb, Anil K. Jain

We propose, WarpGAN, a fully automatic network that can generate caricatures given an input face photo. Besides transferring rich texture styles, WarpGAN learns to automatically predict a set of control points that can warp the photo into a caricature, while preserving identity. We introduce an identity-preserving adversarial loss that aids the discriminator to distinguish between different subjects. Moreover, WarpGAN allows customization of the generated caricatures by controlling the exaggeration extent and the visual styles. Experimental results on a public domain dataset, WebCaricature, show that WarpGAN is capable of generating a diverse set of caricatures while preserving the identities. Five caricature experts suggest that caricatures generated by WarpGAN are visually similar to hand-drawn ones and only prominent facial features are exaggerated.