Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video

Nov 26, 2020
Ben Saunders, Necati Cihan Camgoz, Richard Bowden

To be truly understandable and accepted by Deaf communities, an automatic Sign Language Production (SLP) system must generate a photo-realistic signer. Prior approaches based on graphical avatars have proven unpopular, whereas recent neural SLP works that produce skeleton pose sequences have been shown to be not understandable to Deaf viewers. In this paper, we propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language. We employ a transformer architecture with a Mixture Density Network (MDN) formulation to handle the translation from spoken language to skeletal pose. A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence. This allows the photo-realistic production of sign videos directly translated from written text. We further propose a novel keypoint-based loss function, which significantly improves the quality of synthesized hand images, operating in the keypoint space to avoid issues caused by motion blur. In addition, we introduce a method for controllable video generation, enabling training on large, diverse sign language datasets and providing the ability to control the signer appearance at inference. Using a dataset of eight different sign language interpreters extracted from broadcast footage, we show that SignGAN significantly outperforms all baseline methods for quantitative metrics and human perceptual studies.

Access Paper or Ask Questions

Head Reconstruction from Internet Photos

Sep 13, 2018
Shu Liang, Linda G. Shapiro, Ira Kemelmacher-Shlizerman

3D face reconstruction from Internet photos has recently produced exciting results. A person's face, e.g., Tom Hanks, can be modeled and animated in 3D from a completely uncalibrated photo collection. Most methods, however, focus solely on face area and mask out the rest of the head. This paper proposes that head modeling from the Internet is a problem we can solve. We target reconstruction of the rough shape of the head. Our method is to gradually "grow" the head mesh starting from the frontal face and extending to the rest of views using photometric stereo constraints. We call our method boundary-value growing algorithm. Results on photos of celebrities downloaded from the Internet are presented.

* Published on ECCV 2016 
Access Paper or Ask Questions

Photo style transfer with consistency losses

May 09, 2020
Xu Yao, Gilles Puy, Patrick Pérez

We address the problem of style transfer between two photos and propose a new way to preserve photorealism. Using the single pair of photos available as input, we train a pair of deep convolution networks (convnets), each of which transfers the style of one photo to the other. To enforce photorealism, we introduce a content preserving mechanism by combining a cycle-consistency loss with a self-consistency loss. Experimental results show that this method does not suffer from typical artifacts observed in methods working in the same settings. We then further analyze some properties of these trained convnets. First, we notice that they can be used to stylize other unseen images with same known style. Second, we show that retraining only a small subset of the network parameters can be sufficient to adapt these convnets to new styles.

* In 2019 IEEE International Conference on Image Processing (ICIP) (pp. 2314-2318). IEEE 
Access Paper or Ask Questions

From Photo Streams to Evolving Situations

Feb 20, 2017
Mengfan Tang, Feiping Nie, Siripen Pongpaichet, Ramesh Jain

Photos are becoming spontaneous, objective, and universal sources of information. This paper develops evolving situation recognition using photo streams coming from disparate sources combined with the advances of deep learning. Using visual concepts in photos together with space and time information, we formulate the situation detection into a semi-supervised learning framework and propose new graph-based models to solve the problem. To extend the method for unknown situations, we introduce a soft label method which enables the traditional semi-supervised learning framework to accurately predict predefined labels as well as effectively form new clusters. To overcome the noisy data which degrades graph quality, leading to poor recognition results, we take advantage of two kinds of noise-robust norms which can eliminate the adverse effects of outliers in visual concepts and improve the accuracy of situation recognition. Finally, we demonstrate the idea and the effectiveness of the proposed model on Yahoo Flickr Creative Commons 100 Million.

Access Paper or Ask Questions

CariGANs: Unpaired Photo-to-Caricature Translation

Nov 02, 2018
Kaidi Cao, Jing Liao, Lu Yuan

Facial caricature is an art form of drawing faces in an exaggerated way to convey humor or sarcasm. In this paper, we propose the first Generative Adversarial Network (GAN) for unpaired photo-to-caricature translation, which we call "CariGANs". It explicitly models geometric exaggeration and appearance stylization using two components: CariGeoGAN, which only models the geometry-to-geometry transformation from face photos to caricatures, and CariStyGAN, which transfers the style appearance from caricatures to face photos without any geometry deformation. In this way, a difficult cross-domain translation problem is decoupled into two easier tasks. The perceptual study shows that caricatures generated by our CariGANs are closer to the hand-drawn ones, and at the same time better persevere the identity, compared to state-of-the-art methods. Moreover, our CariGANs allow users to control the shape exaggeration degree and change the color/texture style by tuning the parameters or giving an example caricature.

* ACM Transactions on Graphics, Vol. 37, No. 6, Article 244. Publication date: November 2018 
* To appear at SIGGRAPH Asia 2018 
Access Paper or Ask Questions

Modeling Fashion Influence from Photos

Nov 17, 2020
Ziad Al-Halah, Kristen Grauman

The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively. We propose to discover and quantify fashion influences from catalog and social media photos. We explore fashion influence along two channels: geolocation and fashion brands. We introduce an approach that detects which of these entities influence which other entities in terms of propagating their styles. We then leverage the discovered influence patterns to inform a novel forecasting model that predicts the future popularity of any given style within any given city or brand. To demonstrate our idea, we leverage public large-scale datasets of 7.7M Instagram photos from 44 major world cities (where styles are worn with variable frequency) as well as 41K Amazon product photos (where styles are purchased with variable frequency). Our model learns directly from the image data how styles move between locations and how certain brands affect each other's designs in a predictable way. The discovered influence relationships reveal how both cities and brands exert and receive fashion influence for an array of visual styles inferred from the images. Furthermore, the proposed forecasting model achieves state-of-the-art results for challenging style forecasting tasks. Our results indicate the advantage of grounding visual style evolution both spatially and temporally, and for the first time, they quantify the propagation of inter-brand and inter-city influences.

* To appear in the IEEE Transactions on Multimedia, 2020. Project page: arXiv admin note: substantial text overlap with arXiv:2004.01316 
Access Paper or Ask Questions

All the people around me: face discovery in egocentric photo-streams

May 12, 2017
Maedeh Aghaei, Mariella Dimiccoli, Petia Radeva

Given an unconstrained stream of images captured by a wearable photo-camera (2fpm), we propose an unsupervised bottom-up approach for automatic clustering appearing faces into the individual identities present in these data. The problem is challenging since images are acquired under real world conditions; hence the visible appearance of the people in the images undergoes intensive variations. Our proposed pipeline consists of first arranging the photo-stream into events, later, localizing the appearance of multiple people in them, and finally, grouping various appearances of the same person across different events. Experimental results performed on a dataset acquired by wearing a photo-camera during one month, demonstrate the effectiveness of the proposed approach for the considered purpose.

* 5 pages, 3 figures, accepted in IEEE International Conference on Image Processing (ICIP 2017) 
Access Paper or Ask Questions

Social Style Characterization from Egocentric Photo-streams

Sep 18, 2017
Maedeh Aghaei, Mariella Dimiccoli, Cristian Canton Ferrer, Petia Radeva

This paper proposes a system for automatic social pattern characterization using a wearable photo-camera. The proposed pipeline consists of three major steps. First, detection of people with whom the camera wearer interacts and, second, categorization of the detected social interactions into formal and informal. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task, and a LSTM network is employed for time-series classification. In the last step, recurrences of the same person across the whole set of social interactions are clustered to achieve a comprehensive understanding of the diversity and frequency of the social relations of the user. Experiments over a dataset acquired by a user wearing a photo-camera during a month show promising results on the task of social pattern characterization from egocentric photo-streams.

* International Conference on Computer Vision (ICCV). Workshop on Egocentric Percetion, Interaction and Computing 
Access Paper or Ask Questions

Pseudo Rehearsal using non photo-realistic images

Apr 28, 2020
Bhasker Sri Harsha Suri, Kalidas Yeturu

Deep Neural networks forget previously learnt tasks when they are faced with learning new tasks. This is called catastrophic forgetting. Rehearsing the neural network with the training data of the previous task can protect the network from catastrophic forgetting. Since rehearsing requires the storage of entire previous data, Pseudo rehearsal was proposed, where samples belonging to the previous data are generated synthetically for rehearsal. In an image classification setting, while current techniques try to generate synthetic data that is photo-realistic, we demonstrated that Neural networks can be rehearsed on data that is not photo-realistic and still achieve good retention of the previous task. We also demonstrated that forgoing the constraint of having photo realism in the generated data can result in a significant reduction in the consumption of computational and memory resources for pseudo rehearsal.

Access Paper or Ask Questions