Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Feb 03, 2020
Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon

Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models. In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics. Finally, we qualitatively show the learned ability to interpolate storyline over visual gaps.

* AAAI 2020 paper 
  
Access Paper or Ask Questions

Attribute-controlled face photo synthesis from simple line drawing

Feb 09, 2017
Qi Guo, Ce Zhu, Zhiqiang Xia, Zhengtao Wang, Yipeng Liu

Face photo synthesis from simple line drawing is a one-to-many task as simple line drawing merely contains the contour of human face. Previous exemplar-based methods are over-dependent on the datasets and are hard to generalize to complicated natural scenes. Recently, several works utilize deep neural networks to increase the generalization, but they are still limited in the controllability of the users. In this paper, we propose a deep generative model to synthesize face photo from simple line drawing controlled by face attributes such as hair color and complexion. In order to maximize the controllability of face attributes, an attribute-disentangled variational auto-encoder (AD-VAE) is firstly introduced to learn latent representations disentangled with respect to specified attributes. Then we conduct photo synthesis from simple line drawing based on AD-VAE. Experiments show that our model can well disentangle the variations of attributes from other variations of face photos and synthesize detailed photorealistic face images with desired attributes. Regarding background and illumination as the style and human face as the content, we can also synthesize face photos with the target style of a style photo.

* 5 pages, 5 figures 
  
Access Paper or Ask Questions

A Photo-Based Mobile Crowdsourcing Framework for Event Reporting

May 03, 2020
Aymen Hamrouni, Hakim Ghazzai, Mounir Frikha, Yehia Massoud

Mobile Crowdsourcing (MCS) photo-based is an arising field of interest and a trending topic in the domain of ubiquitous computing. It has recently drawn substantial attention of the smart cities and urban computing communities. In fact, the built-in cameras of mobile devices are becoming the most common way for visual logging techniques in our daily lives. MCS photo-based frameworks collect photos in a distributed way in which a large number of contributors upload photos whenever and wherever it is suitable. This inevitably leads to evolving picture streams which possibly contain misleading and redundant information that affects the task result. In order to overcome these issues, we develop, in this paper, a solution for selecting highly relevant data from an evolving picture stream and ensuring correct submission. The proposed photo-based MCS framework for event reporting incorporates (i) a deep learning model to eliminate false submissions and ensure photos credibility and (ii) an A-Tree shape data structure model for clustering streaming pictures to reduce information redundancy and provide maximum event coverage. Simulation results indicate that the implemented framework can effectively reduce false submissions and select a subset with high utility coverage with low redundancy ratio from the streaming data.

* 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX, USA, 2019, pp. 198-202 
* Published in 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS) 
  
Access Paper or Ask Questions

A Photo-Based Mobile Crowdsourcing Frameworkfor Event Reporting

Apr 28, 2020
Aymen Hamrouni, Hakim Ghazzai, Mounir Frikha, Yehia Massoud

Mobile Crowdsourcing (MCS) photo-based is an arising field of interest and a trending topic in the domain of ubiquitous computing. It has recently drawn substantial attention of the smart cities and urban computing communities. In fact, the built-in cameras of mobile devices are becoming the most common way for visual logging techniques in our daily lives. MCS photo-based frameworks collect photos in a distributed way in which a large number of contributors upload photos whenever and wherever it is suitable. This inevitably leads to evolving picture streams which possibly contain misleading and redundant information that affects the task result. In order to overcome these issues, we develop, in this paper, a solution for selecting highly relevant data from an evolving picture stream and ensuring correct submission. The proposed photo-based MCS framework for event reporting incorporates (i) a deep learning model to eliminate false submissions and ensure photos credibility and (ii) an A-Tree shape data structure model for clustering streaming pictures to reduce information redundancy and provide maximum event coverage. Simulation results indicate that the implemented framework can effectively reduce false submissions and select a subset with high utility coverage with low redundancy ratio from the streaming data.

* 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX, USA, 2019, pp. 198-202 
* Published in 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS) 
  
Access Paper or Ask Questions

3D Moments from Near-Duplicate Photos

May 12, 2022
Qianqian Wang, Zhengqi Li, David Salesin, Noah Snavely, Brian Curless, Janne Kontkanen

We introduce 3D Moments, a new computational photography effect. As input we take a pair of near-duplicate photos, i.e., photos of moving subjects from similar viewpoints, common in people's photo collections. As output, we produce a video that smoothly interpolates the scene motion from the first photo to the second, while also producing camera motion with parallax that gives a heightened sense of 3D. To achieve this effect, we represent the scene as a pair of feature-based layered depth images augmented with scene flow. This representation enables motion interpolation along with independent control of the camera viewpoint. Our system produces photorealistic space-time videos with motion parallax and scene dynamics, while plausibly recovering regions occluded in the original views. We conduct extensive experiments demonstrating superior performance over baselines on public datasets and in-the-wild photos. Project page: https://3d-moments.github.io/

* CVPR 2022 
  
Access Paper or Ask Questions

CheXphoto: 10,000+ Smartphone Photos and Synthetic Photographic Transformations of Chest X-rays for Benchmarking Deep Learning Robustness

Jul 13, 2020
Nick A. Phillips, Pranav Rajpurkar, Mark Sabini, Rayan Krishnan, Sharon Zhou, Anuj Pareek, Nguyet Minh Phu, Chris Wang, Andrew Y. Ng, Matthew P. Lungren

Clinical deployment of deep learning algorithms for chest x-ray interpretation requires a solution that can integrate into the vast spectrum of clinical workflows across the world. An appealing solution to scaled deployment is to leverage the existing ubiquity of smartphones: in several parts of the world, clinicians and radiologists capture photos of chest x-rays to share with other experts or clinicians via smartphone using messaging services like WhatsApp. However, the application of chest x-ray algorithms to photos of chest x-rays requires reliable classification in the presence of smartphone photo artifacts such as screen glare and poor viewing angle not typically encountered on digital x-rays used to train machine learning models. We introduce CheXphoto, a dataset of smartphone photos and synthetic photographic transformations of chest x-rays sampled from the CheXpert dataset. To generate CheXphoto we (1) automatically and manually captured photos of digital x-rays under different settings, including various lighting conditions and locations, and, (2) generated synthetic transformations of digital x-rays targeted to make them look like photos of digital x-rays and x-ray films. We release this dataset as a resource for testing and improving the robustness of deep learning algorithms for automated chest x-ray interpretation on smartphone photos of chest x-rays.

  
Access Paper or Ask Questions

Adversarial Open Domain Adaption Framework (AODA): Sketch-to-Photo Synthesis

Aug 19, 2021
Amey Thakur, Mega Satish

This paper aims to demonstrate the efficiency of the Adversarial Open Domain Adaption framework for sketch-to-photo synthesis. The unsupervised open domain adaption for generating realistic photos from a hand-drawn sketch is challenging as there is no such sketch of that class for training data. The absence of learning supervision and the huge domain gap between both the freehand drawing and picture domains make it hard. We present an approach that learns both sketch-to-photo and photo-to-sketch generation to synthesise the missing freehand drawings from pictures. Due to the domain gap between synthetic sketches and genuine ones, the generator trained on false drawings may produce unsatisfactory results when dealing with drawings of lacking classes. To address this problem, we offer a simple but effective open-domain sampling and optimization method that tricks the generator into considering false drawings as genuine. Our approach generalises the learnt sketch-to-photo and photo-to-sketch mappings from in-domain input to open-domain categories. On the Scribble and SketchyCOCO datasets, we compared our technique to the most current competing methods. For many types of open-domain drawings, our model outperforms impressive results in synthesising accurate colour, substance, and retaining the structural layout.

* This was an undergraduate research effort, and in retrospect, it isn't comprehensive enough 
  
Access Paper or Ask Questions

WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

Mar 03, 2018
Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Luc Van Gool

Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods.

  
Access Paper or Ask Questions

Learning Image-adaptive 3D Lookup Tables for High Performance Photo Enhancement in Real-time

Sep 30, 2020
Hui Zeng, Jianrui Cai, Lida Li, Zisheng Cao, Lei Zhang

Recent years have witnessed the increasing popularity of learning based methods to enhance the color and tone of photos. However, many existing photo enhancement methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In this paper, we learn image-adaptive 3-dimensional lookup tables (3D LUTs) to achieve fast and robust photo enhancement. 3D LUTs are widely used for manipulating color and tone of photos, but they are usually manually tuned and fixed in camera imaging pipeline or photo editing tools. We, for the first time to our best knowledge, propose to learn 3D LUTs from annotated data using pairwise or unpaired learning. More importantly, our learned 3D LUT is image-adaptive for flexible photo enhancement. We learn multiple basis 3D LUTs and a small convolutional neural network (CNN) simultaneously in an end-to-end manner. The small CNN works on the down-sampled version of the input image to predict content-dependent weights to fuse the multiple basis 3D LUTs into an image-adaptive one, which is employed to transform the color and tone of source images efficiently. Our model contains less than 600K parameters and takes less than 2 ms to process an image of 4K resolution using one Titan RTX GPU. While being highly efficient, our model also outperforms the state-of-the-art photo enhancement methods by a large margin in terms of PSNR, SSIM and a color difference metric on two publically available benchmark datasets.

* High quality adaptive photo enhancement in real-time (<2ms for 4K resolution images)! Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence 
  
Access Paper or Ask Questions
<<
1
2
3
4
5
6
7
8
9
>>