Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources

Jul 09, 2018
Hosnieh Sattar, Gerard Pons-Moll, Mario Fritz

To study the correlation between clothing garments and body shape, we collected a new dataset (Fashion Takes Shape), which includes images of users with clothing category annotations. We employ our multi-photo approach to estimate body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated and show that our multi-photo approach leads to a better predictive model for clothing categories compared to models based on single-view shape estimates or manually annotated body types. We see our method as the first step towards the large-scale understanding of clothing preferences from body shape.


Identity Signals in Emoji Do not Influence Perception of Factual Truth on Twitter

May 07, 2021
Alexander Robertson, Walid Magdy, Sharon Goldwater

Prior work has shown that Twitter users use skin-toned emoji as an act of self-representation to express their racial/ethnic identity. We test whether this signal of identity can influence readers' perceptions about the content of a post containing that signal. In a large scale (n=944) pre-registered controlled experiment, we manipulate the presence of skin-toned emoji and profile photos in a task where readers rate obscure trivia facts (presented as tweets) as true or false. Using a Bayesian statistical analysis, we find that neither emoji nor profile photo has an effect on how readers rate these facts. This result will be of some comfort to anyone concerned about the manipulation of online users through the crafting of fake profiles.


3D Photography using Context-aware Layered Depth Inpainting

Apr 14, 2020
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, Jia-Bin Huang

We propose a method for converting a single RGB-D input image into a 3D photo - a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines. We validate the effectiveness of our method on a wide range of challenging everyday scenes and show fewer artifacts compared with the state of the arts.

* CVPR 2020. Project page: Code: Demo: 

Iconify: Converting Photographs into Icons

Apr 07, 2020
Takuro Karamatsu, Gibran Benitez-Garcia, Keiji Yanai, Seiichi Uchida

In this paper, we tackle a challenging domain conversion task between photo and icon images. Although icons often originate from real object images (i.e., photographs), severe abstractions and simplifications are applied to generate icon images by professional graphic designers. Moreover, there is no one-to-one correspondence between the two domains, for this reason we cannot use it as the ground-truth for learning a direct conversion function. Since generative adversarial networks (GAN) can undertake the problem of domain conversion without any correspondence, we test CycleGAN and UNIT to generate icons from objects segmented from photo images. Our experiments with several image datasets prove that CycleGAN learns sufficient abstraction and simplification ability to generate icon-like images.

* to appear at 2020 Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM'20) 

What Makes Kevin Spacey Look Like Kevin Spacey

Jun 02, 2015
Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, Steve Seitz

We reconstruct a controllable model of a person from a large photo collection that captures his or her {\em persona}, i.e., physical appearance and behavior. The ability to operate on unstructured photo collections enables modeling a huge number of people, including celebrities and other well photographed people without requiring them to be scanned. Moreover, we show the ability to drive or {\em puppeteer} the captured person B using any other video of a different person A. In this scenario, B acts out the role of person A, but retains his/her own personality and character. Our system is based on a novel combination of 3D face reconstruction, tracking, alignment, and multi-texture modeling, applied to the puppeteering problem. We demonstrate convincing results on a large variety of celebrities derived from Internet imagery and video.


Survey on Sparse Coded Features for Content Based Face Image Retrieval

Feb 20, 2014
D. Johnvictor, G. Selvavinayagam

Content based image retrieval, a technique which uses visual contents of image to search images from large scale image databases according to users' interests. This paper provides a comprehensive survey on recent technology used in the area of content based face image retrieval. Nowadays digital devices and photo sharing sites are getting more popularity, large human face photos are available in database. Multiple types of facial features are used to represent discriminality on large scale human facial image database. Searching and mining of facial images are challenging problems and important research issues. Sparse representation on features provides significant improvement in indexing related images to query image.

* International Journal of Computer Trends and Technology (IJCTT) 8(1):30-33, February 2014. ISSN:2231-2803 
* 4 pages,3 figures,1 table, Published with International Journal of Computer Trends and Technology (IJCTT) 

Aesthetic Quality Assessment for Group photograph

Feb 04, 2020
Yaoting Wang, Yongzhen Ke, Kai Wang, Cuijiao Zhang, Fan Qin

Image aesthetic quality assessment has got much attention in recent years, but not many works have been done on a specific genre of photos: Group photograph. In this work, we designed a set of high-level features based on the experience and principles of group photography: Opened-eye, Gaze, Smile, Occluded faces, Face Orientation, Facial blur, Character center. Then we combined them and 83 generic aesthetic features to build two aesthetic assessment models. We also constructed a large dataset of group photographs - GPD- annotated with the aesthetic score. The experimental result shows that our features perform well for categorizing professional photos and snapshots and predicting the distinction of multiple group photographs of diverse human states under the same scene.


MemexQA: Visual Memex Question Answering

Aug 04, 2017
Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection. Towards solving the task, we 1) present the MemexQA dataset, a large, realistic multimodal dataset consisting of real personal photos and crowd-sourced questions/answers, 2) propose MemexNet, a unified, end-to-end trainable network architecture for image, text and video question answering. Experimental results on the MemexQA dataset demonstrate that MemexNet outperforms strong baselines and yields the state-of-the-art on this novel and challenging task. The promising results on TextQA and VideoQA suggest MemexNet's efficacy and scalability across various QA tasks.



Apr 24, 2017
Hamid Izadinia, Qi Shan, Steven M. Seitz

Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement.

* To appear at CVPR 2017