Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

The Conditional Analogy GAN: Swapping Fashion Articles on People Images

Sep 14, 2017
Nikolay Jetchev, Urs Bergmann

We present a novel method to solve image analogy problems : it allows to learn the relation between paired images present in training data, and then generalize and generate images that correspond to the relation, but were never seen in the training set. Therefore, we call the method Conditional Analogy Generative Adversarial Network (CAGAN), as it is based on adversarial training and employs deep convolutional neural networks. An especially interesting application of that technique is automatic swapping of clothing on fashion model photos. Our work has the following contributions. First, the definition of the end-to-end trainable CAGAN architecture, which implicitly learns segmentation masks without expensive supervised labeling data. Second, experimental results show plausible segmentation masks and often convincing swapped images, given the target article. Finally, we discuss the next steps for that technique: neural network architecture improvements and more advanced applications.

* To appear at the International Conference on Computer Vision, ICCV 2017, Workshop on Computer Vision for Fashion 
Access Paper or Ask Questions

Neural 3D Reconstruction in the Wild

May 25, 2022
Jiaming Sun, Xi Chen, Qianqian Wang, Zhengqi Li, Hadar Averbuch-Elor, Xiaowei Zhou, Noah Snavely

We are witnessing an explosion of neural implicit representations in computer vision and graphics. Their applicability has recently expanded beyond tasks such as shape generation and image-based rendering to the fundamental problem of image-based 3D reconstruction. However, existing methods typically assume constrained 3D environments with constant illumination captured by a small set of roughly uniformly distributed cameras. We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections in the presence of varying illumination. To achieve this, we propose a hybrid voxel- and surface-guided sampling technique that allows for more efficient ray sampling around surfaces and leads to significant improvements in reconstruction quality. Further, we present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes. We perform extensive experiments, demonstrating that our approach surpasses both classical and neural reconstruction methods on a wide variety of metrics.

* Accepted to SIGGRAPH 2022 (Conference Proceedings). Project page: 
Access Paper or Ask Questions

Object Detection in Indian Food Platters using Transfer Learning with YOLOv4

May 10, 2022
Deepanshu Pandey, Purva Parmar, Gauri Toshniwal, Mansi Goel, Vishesh Agrawal, Shivangi Dhiman, Lavanya Gupta, Ganesh Bagler

Object detection is a well-known problem in computer vision. Despite this, its usage and pervasiveness in the traditional Indian food dishes has been limited. Particularly, recognizing Indian food dishes present in a single photo is challenging due to three reasons: 1. Lack of annotated Indian food datasets 2. Non-distinct boundaries between the dishes 3. High intra-class variation. We solve these issues by providing a comprehensively labelled Indian food dataset- IndianFood10, which contains 10 food classes that appear frequently in a staple Indian meal and using transfer learning with YOLOv4 object detector model. Our model is able to achieve an overall mAP score of 91.8% and f1-score of 0.90 for our 10 class dataset. We also provide an extension of our 10 class dataset- IndianFood20, which contains 10 more traditional Indian food classes.

* 6 pages, 7 figures, 38th IEEE International Conference on Data Engineering, 2022, DECOR Workshop 
Access Paper or Ask Questions

PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models

Mar 25, 2022
Tai-Yin Chiu, Danna Gurari

Photorealistic style transfer entails transferring the style of a reference image to another image so the result seems like a plausible photo. Our work is inspired by the observation that existing models are slow due to their large sizes. We introduce PCA-based knowledge distillation to distill lightweight models and show it is motivated by theory. To our knowledge, this is the first knowledge distillation method for photorealistic style transfer. Our experiments demonstrate its versatility for use with different backbone architectures, VGG and MobileNet, across six image resolutions. Compared to existing models, our top-performing model runs at speeds 5-20x faster using at most 1\% of the parameters. Additionally, our distilled models achieve a better balance between stylization strength and content preservation than existing models. To support reproducing our method and models, we share the code at \textit{}.

Access Paper or Ask Questions

Cellular Network Radio Propagation Modeling with Deep Convolutional Neural Networks

Oct 05, 2021
Xin Zhang, Xiujun Shu, Bingwen Zhang, Jie Ren, Lizhou Zhou, Xin Chen

Radio propagation modeling and prediction is fundamental for modern cellular network planning and optimization. Conventional radio propagation models fall into two categories. Empirical models, based on coarse statistics, are simple and computationally efficient, but are inaccurate due to oversimplification. Deterministic models, such as ray tracing based on physical laws of wave propagation, are more accurate and site specific. But they have higher computational complexity and are inflexible to utilize site information other than traditional global information system (GIS) maps. In this article we present a novel method to model radio propagation using deep convolutional neural networks and report significantly improved performance compared to conventional models. We also lay down the framework for data-driven modeling of radio propagation and enable future research to utilize rich and unconventional information of the site, e.g. satellite photos, to provide more accurate and flexible models.

* Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August, 2020, Pages 2378 
Access Paper or Ask Questions

Segmentation of skin lesions and their attributes using Generative Adversarial Networks

Jan 30, 2021
Cristian Lazo

This work is about the semantic segmentation of skin lesion boundary and their attributes using Image-to-Image Translation with Conditional Adversarial Nets. Melanoma is a type of skin cancer that can be cured if detected in time. Segmentation into dermoscopic images is an essential procedure for computer-assisted diagnosis due to its existing artifacts typical of skin images. To alleviate the image annotation process, we propose to use a modified Pix2Pix network. The discriminator network learns the mapping from a dermal image as an input and a mask image of six channels as an output. Likewise, the discriminative network output called PatchGAN is varied for one channel and six output channels. The photos used come from the 2018 ISIC Challenge, where 500 photographs are used with their respective semantic map, divided into 75% for training and 35% for testing. Obtaining for 100 training epochs high Jaccard indices for all attributes of the segmentation map.

* LatinX in AI Research at NeurIPS 2019 
Access Paper or Ask Questions

MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs

Dec 04, 2020
Fangda Han, Guoyao Hao, Ricardo Guerrero, Vladimir Pavlovic

Multilabel conditional image generation is a challenging problem in computer vision. In this work we propose Multi-ingredient Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing multilabel images. We design MPG based on a state-of-the-art GAN structure called StyleGAN2, in which we develop a new conditioning technique by enforcing intermediate feature maps to learn scalewise label information. Because of the complex nature of the multilabel image generation problem, we also regularize synthetic image by predicting the corresponding ingredients as well as encourage the discriminator to distinguish between matched image and mismatched image. To verify the efficacy of MPG, we test it on Pizza10, which is a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photo-realist pizza images with desired ingredients. The framework can be easily extend to other multilabel image generation scenarios.

Access Paper or Ask Questions

Deep Semantic Face Deblurring

Mar 16, 2018
Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang

In this paper, we present an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks (CNNs). As face images are highly structured and share several key semantic components (e.g., eyes and mouths), the semantic information of a face provides a strong prior for restoration. As such, we propose to incorporate global semantic priors as input and impose local structure losses to regularize the output within a multi-scale deep CNN. We train the network with perceptual and adversarial losses to generate photo-realistic results and develop an incremental training strategy to handle random blur kernels in the wild. Quantitative and qualitative evaluations demonstrate that the proposed face deblurring algorithm restores sharp images with more facial details and performs favorably against state-of-the-art methods in terms of restoration quality, face recognition and execution speed.

* This work is accepted in CVPR 2018. The project website is on 
Access Paper or Ask Questions

Outlier Cluster Formation in Spectral Clustering

Mar 03, 2017
Takuro Ina, Atsushi Hashimoto, Masaaki Iiyama, Hidekazu Kasahara, Mikihiko Mori, Michihiko Minoh

Outlier detection and cluster number estimation is an important issue for clustering real data. This paper focuses on spectral clustering, a time-tested clustering method, and reveals its important properties related to outliers. The highlights of this paper are the following two mathematical observations: first, spectral clustering's intrinsic property of an outlier cluster formation, and second, the singularity of an outlier cluster with a valid cluster number. Based on these observations, we designed a function that evaluates clustering and outlier detection results. In experiments, we prepared two scenarios, face clustering in photo album and person re-identification in a camera network. We confirmed that the proposed method detects outliers and estimates the number of clusters properly in both problems. Our method outperforms state-of-the-art methods in both the 128-dimensional sparse space for face clustering and the 4,096-dimensional non-sparse space for person re-identification.

* 10 pages, 2 figures, 2 tables 
Access Paper or Ask Questions

Self-Supervised Shadow Removal

Oct 22, 2020
Florin-Alexandru Vasluianu, Andres Romero, Luc Van Gool, Radu Timofte

Shadow removal is an important computer vision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photo-realistic restoration of the image contents. Decades of re-search produced a multitude of hand-crafted restoration techniques and, more recently, learned solutions from shad-owed and shadow-free training image pairs. In this work,we propose an unsupervised single image shadow removal solution via self-supervised learning by using a conditioned mask. In contrast to existing literature, we do not require paired shadowed and shadow-free images, instead we rely on self-supervision and jointly learn deep models to remove and add shadows to images. We validate our approach on the recently introduced ISTD and USR datasets. We largely improve quantitatively and qualitatively over the compared methods and set a new state-of-the-art performance in single image shadow removal.

* 10 pages, 4 figures, 6 tables 
Access Paper or Ask Questions