"photo": models, code, and papers

CariMe: Unpaired Caricature Generation with Multiple Exaggerations

Oct 01, 2020
Zheng Gu, Chuanqi Dong, Jing Huo, Wenbin Li, Yang Gao

Caricature generation aims to translate real photos into caricatures with artistic styles and shape exaggerations while maintaining the identity of the subject. Different from the generic image-to-image translation, drawing a caricature automatically is a more challenging task due to the existence of various spacial deformations. Previous caricature generation methods are obsessed with predicting definite image warping from a given photo while ignoring the intrinsic representation and distribution for exaggerations in caricatures. This limits their ability on diverse exaggeration generation. In this paper, we generalize the caricature generation problem from instance-level warping prediction to distribution-level deformation modeling. Based on this assumption, we present the first exploration for unpaired CARIcature generation with Multiple Exaggerations (CariMe). Technically, we propose a Multi-exaggeration Warper network to learn the distribution-level mapping from photo to facial exaggerations. This makes it possible to generate diverse and reasonable exaggerations from randomly sampled warp codes given one input photo. To better represent the facial exaggeration and produce fine-grained warping, a deformation-field-based warping method is also proposed, which helps us to capture more detailed exaggerations than other point-based warping methods. Experiments and two perceptual studies prove the superiority of our method comparing with other state-of-the-art methods, showing the improvement of our work on caricature generation.


Learning Visual Storylines with Skipping Recurrent Neural Networks

Jul 26, 2016
Gunnar A. Sigurdsson, Xinlei Chen, Abhinav Gupta

What does a typical visit to Paris look like? Do people first take photos of the Louvre and then the Eiffel Tower? Can we visually model a temporal event like "Paris Vacation" using current frameworks? In this paper, we explore how we can automatically learn the temporal aspects, or storylines of visual concepts from web data. Previous attempts focus on consecutive image-to-image transitions and are unsuccessful at recovering the long-term underlying story. Our novel Skipping Recurrent Neural Network (S-RNN) model does not attempt to predict each and every data point in the sequence, like classic RNNs. Rather, S-RNN uses a framework that skips through the images in the photo stream to explore the space of all ordered subsets of the albums via an efficient sampling procedure. This approach reduces the negative impact of strong short-term correlations, and recovers the latent story more accurately. We show how our learned storylines can be used to analyze, predict, and summarize photo albums from Flickr. Our experimental results provide strong qualitative and quantitative evidence that S-RNN is significantly better than other candidate methods such as LSTMs on learning long-term correlations and recovering latent storylines. Moreover, we show how storylines can help machines better understand and summarize photo streams by inferring a brief personalized story of each individual album.

* European Conference on Computer Vision (ECCV) 2016 

Multi-granularity Association Learning Framework for on-the-fly Fine-Grained Sketch-based Image Retrieval

Jan 13, 2022
Dawei Dai, Xiaoyu Tang, Shuyin Xia, Yingge Liu, Guoyin Wang, Zizhong Chen

Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo in a given query sketch. However, its widespread applicability is limited by the fact that it is difficult to draw a complete sketch for most people, and the drawing process often takes time. In this study, we aim to retrieve the target photo with the least number of strokes possible (incomplete sketch), named on-the-fly FG-SBIR (Bhunia et al. 2020), which starts retrieving at each stroke as soon as the drawing begins. We consider that there is a significant correlation among these incomplete sketches in the sketch drawing episode of each photo. To learn more efficient joint embedding space shared between the photo and its incomplete sketches, we propose a multi-granularity association learning framework that further optimizes the embedding space of all incomplete sketches. Specifically, based on the integrity of the sketch, we can divide a complete sketch episode into several stages, each of which corresponds to a simple linear mapping layer. Moreover, our framework guides the vector space representation of the current sketch to approximate that of its later sketches to realize the retrieval performance of the sketch with fewer strokes to approach that of the sketch with more strokes. In the experiments, we proposed more realistic challenges, and our method achieved superior early retrieval efficiency over the state-of-the-art methods and alternative baselines on two publicly available fine-grained sketch retrieval datasets.

* 17 pages, 9 figures 

Enhance Gender and Identity Preservation in Face Aging Simulation for Infants and Toddlers

Nov 15, 2020
Yao Xiao, Yijun Zhao

Realistic age-progressed photos provide invaluable biometric information in a wide range of applications. In recent years, deep learning-based approaches have made remarkable progress in modeling the aging process of the human face. Nevertheless, it remains a challenging task to generate accurate age-progressed faces from infant or toddler photos. In particular, the lack of visually detectable gender characteristics and the drastic appearance changes in early life contribute to the difficulty of the task. We propose a new deep learning method inspired by the successful Conditional Adversarial Autoencoder (CAAE, 2017) model. In our approach, we extend the CAAE architecture to 1) incorporate gender information, and 2) augment the model's overall architecture with an identity-preserving component based on facial features. We trained our model using the publicly available UTKFace dataset and evaluated our model by simulating up to 100 years of aging on 1,156 male and 1,207 female infant and toddler face photos. Compared to the CAAE approach, our new model demonstrates noticeable visual improvements. Quantitatively, our model exhibits an overall gain of 77.0% (male) and 13.8% (female) in gender fidelity measured by a gender classifier for the simulated photos across the age spectrum. Our model also demonstrates a 22.4% gain in identity preservation measured by a facial recognition neural network.

* 8 pages, 2 figures 

Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network

Nov 28, 2018
Peng Lu, Hangyu Lin, Yanwei Fu, Shaogang Gong, Yu-Gang Jiang, Xiangyang Xue

Sketch has been employed as an effective communicative tool to express the abstract and intuitive meanings of object. Recognizing the free-hand sketch drawing is extremely useful in many real-world applications. While content-based sketch recognition has been studied for several decades, the instance-level Sketch-Based Image Retrieval (SBIR) tasks have attracted significant research attention recently. The existing datasets such as QMUL-Chair and QMUL-Shoe, focus on the retrieval tasks of chairs and shoes. However, there are several key limitations in previous instance-level SBIR works. The state-of-the-art works have to heavily rely on the pre-training process, quality of edge maps, multi-cropping testing strategy, and augmenting sketch images. To efficiently solve the instance-level SBIR, we propose a new Deep Triplet Classification Siamese Network (DeepTCNet) which employs DenseNet-169 as the basic feature extractor and is optimized by the triplet loss and classification loss. Critically, our proposed DeepTCNet can break the limitations existed in previous works. The extensive experiments on five benchmark sketch datasets validate the effectiveness of the proposed model. Additionally, to study the tasks of sketch-based hairstyle retrieval, this paper contributes a new instance-level photo-sketch dataset - Hairstyle Photo-Sketch dataset, which is composed of 3600 sketches and photos, and 2400 sketch-photo pairs.

* 15 pages, 7 figures. Peng Lu and Hangyu Lin share equal contributions 

Comparision and analysis of photo image forgery detection techniques

Jan 10, 2013
S. Murali, Govindraj B. Chittapur, Prabhakara H. S, Basavaraj S. Anami

Digital Photo images are everywhere, on the covers of magazines, in newspapers, in courtrooms, and all over the Internet. We are exposed to them throughout the day and most of the time. Ease with which images can be manipulated; we need to be aware that seeing does not always imply believing. We propose methodologies to identify such unbelievable photo images and succeeded to identify forged region by given only the forged image. Formats are additive tag for every file system and contents are relatively expressed with extension based on most popular digital camera uses JPEG and Other image formats like png, bmp etc. We have designed algorithm running behind with the concept of abnormal anomalies and identify the forgery regions.

* 12 pages, International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.6, December 2012 

Projective Urban Texturing

Feb 04, 2022
Yiangos Georgiou, Melinos Averkiou, Tom Kelly, Evangelos Kalogerakis

This paper proposes a method for automatic generation of textures for 3D city meshes in immersive urban environments. Many recent pipelines capture or synthesize large quantities of city geometry using scanners or procedural modeling pipelines. Such geometry is intricate and realistic, however the generation of photo-realistic textures for such large scenes remains a problem. We propose to generate textures for input target 3D meshes driven by the textural style present in readily available datasets of panoramic photos capturing urban environments. Re-targeting such 2D datasets to 3D geometry is challenging because the underlying shape, size, and layout of the urban structures in the photos do not correspond to the ones in the target meshes. Photos also often have objects (e.g., trees, vehicles) that may not even be present in the target geometry. To address these issues we present a method, called Projective Urban Texturing (PUT), which re-targets textural style from real-world panoramic images to unseen urban meshes. PUT relies on contrastive and adversarial training of a neural architecture designed for unpaired image-to-texture translation. The generated textures are stored in a texture atlas applied to the target 3D mesh geometry. To promote texture consistency, PUT employs an iterative procedure in which texture synthesis is conditioned on previously generated, adjacent textures. We demonstrate both quantitative and qualitative evaluation of the generated textures.

* International Conference on 3D Vision 2021 

ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval

Nov 24, 2021
Hao Ren, Ziqiang Zheng, Yang Wu, Hong Lu, Yang Yang, Sai-Kit Yeung

The huge domain gap between sketches and photos and the highly abstract sketch representations pose challenges for sketch-based image retrieval (\underline{SBIR}). The zero-shot sketch-based image retrieval (\underline{ZS-SBIR}) is more generic and practical but poses an even greater challenge because of the additional knowledge gap between the seen and unseen categories. To simultaneously mitigate both gaps, we propose an \textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed ``\textbf{ACNet}'') to jointly optimize sketch-to-photo synthesis and the image retrieval. The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain, and thus better serve the retrieval module than ever to learn domain-agnostic representations and category-agnostic common knowledge for generalizing to unseen categories. These diverse images generated with retrieval guidance can effectively alleviate the overfitting problem troubling concrete category-specific training samples with high gradients. We also discover the use of proxy-based NormSoftmax loss is effective in the zero-shot setting because its centralizing effect can stabilize our joint training and promote the generalization ability to unseen categories. Our approach is simple yet effective, which achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.


Spatio-Temporal Sentiment Hotspot Detection Using Geotagged Photos

Sep 21, 2016
Yi Zhu, Shawn Newsam

We perform spatio-temporal analysis of public sentiment using geotagged photo collections. We develop a deep learning-based classifier that predicts the emotion conveyed by an image. This allows us to associate sentiment with place. We perform spatial hotspot detection and show that different emotions have distinct spatial distributions that match expectations. We also perform temporal analysis using the capture time of the photos. Our spatio-temporal hotspot detection correctly identifies emerging concentrations of specific emotions and year-by-year analyses of select locations show there are strong temporal correlations between the predicted emotions and known events.

* To appear in ACM SIGSPATIAL 2016