Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Enhance Gender and Identity Preservation in Face Aging Simulation for Infants and Toddlers

Nov 15, 2020
Yao Xiao, Yijun Zhao

Realistic age-progressed photos provide invaluable biometric information in a wide range of applications. In recent years, deep learning-based approaches have made remarkable progress in modeling the aging process of the human face. Nevertheless, it remains a challenging task to generate accurate age-progressed faces from infant or toddler photos. In particular, the lack of visually detectable gender characteristics and the drastic appearance changes in early life contribute to the difficulty of the task. We propose a new deep learning method inspired by the successful Conditional Adversarial Autoencoder (CAAE, 2017) model. In our approach, we extend the CAAE architecture to 1) incorporate gender information, and 2) augment the model's overall architecture with an identity-preserving component based on facial features. We trained our model using the publicly available UTKFace dataset and evaluated our model by simulating up to 100 years of aging on 1,156 male and 1,207 female infant and toddler face photos. Compared to the CAAE approach, our new model demonstrates noticeable visual improvements. Quantitatively, our model exhibits an overall gain of 77.0% (male) and 13.8% (female) in gender fidelity measured by a gender classifier for the simulated photos across the age spectrum. Our model also demonstrates a 22.4% gain in identity preservation measured by a facial recognition neural network.

* 8 pages, 2 figures 

Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network

Nov 28, 2018
Peng Lu, Hangyu Lin, Yanwei Fu, Shaogang Gong, Yu-Gang Jiang, Xiangyang Xue

Sketch has been employed as an effective communicative tool to express the abstract and intuitive meanings of object. Recognizing the free-hand sketch drawing is extremely useful in many real-world applications. While content-based sketch recognition has been studied for several decades, the instance-level Sketch-Based Image Retrieval (SBIR) tasks have attracted significant research attention recently. The existing datasets such as QMUL-Chair and QMUL-Shoe, focus on the retrieval tasks of chairs and shoes. However, there are several key limitations in previous instance-level SBIR works. The state-of-the-art works have to heavily rely on the pre-training process, quality of edge maps, multi-cropping testing strategy, and augmenting sketch images. To efficiently solve the instance-level SBIR, we propose a new Deep Triplet Classification Siamese Network (DeepTCNet) which employs DenseNet-169 as the basic feature extractor and is optimized by the triplet loss and classification loss. Critically, our proposed DeepTCNet can break the limitations existed in previous works. The extensive experiments on five benchmark sketch datasets validate the effectiveness of the proposed model. Additionally, to study the tasks of sketch-based hairstyle retrieval, this paper contributes a new instance-level photo-sketch dataset - Hairstyle Photo-Sketch dataset, which is composed of 3600 sketches and photos, and 2400 sketch-photo pairs.

* 15 pages, 7 figures. Peng Lu and Hangyu Lin share equal contributions 

Comparision and analysis of photo image forgery detection techniques

Jan 10, 2013
S. Murali, Govindraj B. Chittapur, Prabhakara H. S, Basavaraj S. Anami

Digital Photo images are everywhere, on the covers of magazines, in newspapers, in courtrooms, and all over the Internet. We are exposed to them throughout the day and most of the time. Ease with which images can be manipulated; we need to be aware that seeing does not always imply believing. We propose methodologies to identify such unbelievable photo images and succeeded to identify forged region by given only the forged image. Formats are additive tag for every file system and contents are relatively expressed with extension based on most popular digital camera uses JPEG and Other image formats like png, bmp etc. We have designed algorithm running behind with the concept of abnormal anomalies and identify the forgery regions.

* 12 pages, International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.6, December 2012 

Projective Urban Texturing

Feb 04, 2022
Yiangos Georgiou, Melinos Averkiou, Tom Kelly, Evangelos Kalogerakis

This paper proposes a method for automatic generation of textures for 3D city meshes in immersive urban environments. Many recent pipelines capture or synthesize large quantities of city geometry using scanners or procedural modeling pipelines. Such geometry is intricate and realistic, however the generation of photo-realistic textures for such large scenes remains a problem. We propose to generate textures for input target 3D meshes driven by the textural style present in readily available datasets of panoramic photos capturing urban environments. Re-targeting such 2D datasets to 3D geometry is challenging because the underlying shape, size, and layout of the urban structures in the photos do not correspond to the ones in the target meshes. Photos also often have objects (e.g., trees, vehicles) that may not even be present in the target geometry. To address these issues we present a method, called Projective Urban Texturing (PUT), which re-targets textural style from real-world panoramic images to unseen urban meshes. PUT relies on contrastive and adversarial training of a neural architecture designed for unpaired image-to-texture translation. The generated textures are stored in a texture atlas applied to the target 3D mesh geometry. To promote texture consistency, PUT employs an iterative procedure in which texture synthesis is conditioned on previously generated, adjacent textures. We demonstrate both quantitative and qualitative evaluation of the generated textures.

* International Conference on 3D Vision 2021 

ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval

Nov 24, 2021
Hao Ren, Ziqiang Zheng, Yang Wu, Hong Lu, Yang Yang, Sai-Kit Yeung

The huge domain gap between sketches and photos and the highly abstract sketch representations pose challenges for sketch-based image retrieval (\underline{SBIR}). The zero-shot sketch-based image retrieval (\underline{ZS-SBIR}) is more generic and practical but poses an even greater challenge because of the additional knowledge gap between the seen and unseen categories. To simultaneously mitigate both gaps, we propose an \textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed ``\textbf{ACNet}'') to jointly optimize sketch-to-photo synthesis and the image retrieval. The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain, and thus better serve the retrieval module than ever to learn domain-agnostic representations and category-agnostic common knowledge for generalizing to unseen categories. These diverse images generated with retrieval guidance can effectively alleviate the overfitting problem troubling concrete category-specific training samples with high gradients. We also discover the use of proxy-based NormSoftmax loss is effective in the zero-shot setting because its centralizing effect can stabilize our joint training and promote the generalization ability to unseen categories. Our approach is simple yet effective, which achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.


Spatio-Temporal Sentiment Hotspot Detection Using Geotagged Photos

Sep 21, 2016
Yi Zhu, Shawn Newsam

We perform spatio-temporal analysis of public sentiment using geotagged photo collections. We develop a deep learning-based classifier that predicts the emotion conveyed by an image. This allows us to associate sentiment with place. We perform spatial hotspot detection and show that different emotions have distinct spatial distributions that match expectations. We also perform temporal analysis using the capture time of the photos. Our spatio-temporal hotspot detection correctly identifies emerging concentrations of specific emotions and year-by-year analyses of select locations show there are strong temporal correlations between the predicted emotions and known events.

* To appear in ACM SIGSPATIAL 2016 

Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval

Apr 06, 2019
Sounak Dey, Pau Riba, Anjan Dutta, Josep Llados, Yi-Zhe Song

In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research


GIF: Generative Interpretable Faces

Aug 31, 2020
Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael Black, Timo Bolkart

Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. On one end of the spectrum, 3D face modeling methods provide parametric control but tend to generate unrealistic images, while on the other end, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to disentangle different factors in an unsupervised manner, or by adding control post hoc to a pre-trained model. Trained GANs without pre-defined control, however, may entangle factors that are hard to undo later. To guarantee some disentanglement that provides us with desired kinds of control, we train our generative model conditioned on pre-defined control parameters. Specifically, we condition StyleGAN2 on FLAME, a generative 3D face model. However, we found out that a naive conditioning on FLAME parameters yields rather unsatisfactory results. Instead we render out geometry and photo-metric details of the FLAME mesh and use these for conditioning instead. This gives us a generative 2D face model named GIF (Generative Interpretable Faces) that shares FLAME's parametric control. Given FLAME parameters for shape, pose, and expressions, parameters for appearance and lighting, and an additional style vector, GIF outputs photo-realistic face images. To evaluate how well GIF follows its conditioning and the impact of different design choices, we perform a perceptual study. The code and trained model are publicly available for research purposes at


DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification

Nov 16, 2016
Shu Zhang, Ran He, Tieniu Tan

MeshFace photos have been widely used in many Chinese business organizations to protect ID face photos from being misused. The occlusions incurred by random meshes severely degenerate the performance of face verification systems, which raises the MeshFace verification problem between MeshFace and daily photos. Previous methods cast this problem as a typical low-level vision problem, i.e. blind inpainting. They recover perceptually pleasing clear ID photos from MeshFaces by enforcing pixel level similarity between the recovered ID images and the ground-truth clear ID images and then perform face verification on them. Essentially, face verification is conducted on a compact feature space rather than the image pixel space. Therefore, this paper argues that pixel level similarity and feature level similarity jointly offer the key to improve the verification performance. Based on this insight, we offer a novel feature oriented blind face inpainting framework. Specifically, we implement this by establishing a novel DeMeshNet, which consists of three parts. The first part addresses blind inpainting of the MeshFaces by implicitly exploiting extra supervision from the occlusion position to enforce pixel level similarity. The second part explicitly enforces a feature level similarity in the compact feature space, which can explore informative supervision from the feature space to produce better inpainting results for verification. The last part copes with face alignment within the net via a customized spatial transformer module when extracting deep facial features. All the three parts are implemented within an end-to-end network that facilitates efficient optimization. Extensive experiments on two MeshFace datasets demonstrate the effectiveness of the proposed DeMeshNet as well as the insight of this paper.

* 10pages, submitted to CVPR 17