Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Deep Factorised Inverse-Sketching

Aug 07, 2018
Kaiyue Pang, Da Li, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Modelling human free-hand sketches has become topical recently, driven by practical applications such as fine-grained sketch based image retrieval (FG-SBIR). Sketches are clearly related to photo edge-maps, but a human free-hand sketch of a photo is not simply a clean rendering of that photo's edge map. Instead there is a fundamental process of abstraction and iconic rendering, where overall geometry is warped and salient details are selectively included. In this paper we study this sketching process and attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries, and separately factorise out the salient added details. This factorised re-representation makes it easier to match a free-hand sketch to a photo instance of an object. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep FG-SBIR model is then formulated to accommodate complementary discriminative detail from each factorised sketch for better matching with the corresponding photo. Our method is evaluated both qualitatively and quantitatively to demonstrate its superiority over a number of state-of-the-art alternatives for style transfer and FG-SBIR.

* Accepted to ECCV 2018 
  

A picture is worth a thousand words but how to organize thousands of pictures?

Mar 15, 2018
Stefan Lonn, Petia Radeva, Mariella Dimiccoli

We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 10 persons. Experimental results demonstrate better user satisfaction with respect to state of the art solutions in terms of organization.

  

Hierarchically-Attentive RNN for Album Summarization and Storytelling

Aug 09, 2017
Licheng Yu, Mohit Bansal, Tamara L. Berg

We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.

* To appear at EMNLP-2017 (7 pages) 
  

ROMNet: Renovate the Old Memories

Feb 05, 2022
Runsheng Xu, Zhengzhong Tu, Yuanqi Du, Xiaoyu Dong, Jinlong Li, Zibo Meng, Jiaqi Ma, Hongkai YU

Renovating the memories in old photos is an intriguing research topic in computer vision fields. These legacy images often suffer from severe and commingled degradations such as cracks, noise, and color-fading, while lack of large-scale paired old photo datasets makes this restoration task very challenging. In this work, we present a novel reference-based end-to-end learning framework that can jointly repair and colorize the degraded legacy pictures. Specifically, the proposed framework consists of three modules: a restoration sub-network for degradation restoration, a similarity sub-network for color histogram matching and transfer, and a colorization subnet that learns to predict the chroma elements of the images conditioned on chromatic reference signals. The whole system takes advantage of the color histogram priors in a given reference image, which vastly reduces the dependency on large-scale training data. Apart from the proposed method, we also create, to our knowledge, the first public and real-world old photo dataset with paired ground truth for evaluating old photo restoration models, wherein each old photo is paired with a manually restored pristine image by PhotoShop experts. Our extensive experiments conducted on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-arts both quantitatively and qualitatively.

* Submitted to TIP 
  

BIM Hyperreality: Data Synthesis Using BIM and Hyperrealistic Rendering for Deep Learning

May 10, 2021
Mohammad Alawadhi, Wei Yan

Deep learning is expected to offer new opportunities and a new paradigm for the field of architecture. One such opportunity is teaching neural networks to visually understand architectural elements from the built environment. However, the availability of large training datasets is one of the biggest limitations of neural networks. Also, the vast majority of training data for visual recognition tasks is annotated by humans. In order to resolve this bottleneck, we present a concept of a hybrid system using both building information modeling (BIM) and hyperrealistic (photorealistic) rendering to synthesize datasets for training a neural network for building object recognition in photos. For generating our training dataset BIMrAI, we used an existing BIM model and a corresponding photo-realistically rendered model of the same building. We created methods for using renderings to train a deep learning model, trained a generative adversarial network (GAN) model using these methods, and tested the output model on real-world photos. For the specific case study presented in this paper, our results show that a neural network trained with synthetic data; i.e., photorealistic renderings and BIM-based semantic labels, can be used to identify building objects from photos without using photos in the training data. Future work can enhance the presented methods using available BIM models and renderings for more generalized mapping and description of photographed built environments.

* Accepted to the 40th Annual Conference of the Association for Computer Aided Design in Architecture (ACADIA 2020) 
  

Large Scale Landmark Recognition via Deep Metric Learning

Aug 29, 2019
Andrei Boiarov, Eduard Tyantov

This paper presents a novel approach for landmark recognition in images that we've successfully deployed at Mail ru. This method enables us to recognize famous places, buildings, monuments, and other landmarks in user photos. The main challenge lies in the fact that it's very complicated to give a precise definition of what is and what is not a landmark. Some buildings, statues and natural objects are landmarks; others are not. There's also no database with a fairly large number of landmarks to train a recognition model. A key feature of using landmark recognition in a production environment is that the number of photos containing landmarks is extremely small. This is why the model should have a very low false positive rate as well as high recognition accuracy. We propose a metric learning-based approach that successfully deals with existing challenges and efficiently handles a large number of landmarks. Our method uses a deep neural network and requires a single pass inference that makes it fast to use in production. We also describe an algorithm for cleaning landmarks database which is essential for training a metric learning model. We provide an in-depth description of basic components of our method like neural network architecture, the learning strategy, and the features of our metric learning approach. We show the results of proposed solutions in tests that emulate the distribution of photos with and without landmarks from a user collection. We compare our method with others during these tests. The described system has been deployed as a part of a photo recognition solution at Cloud Mail ru, which is the photo sharing and storage service at Mail ru Group.

* Accepted at CIKM 2019 
  

One Shot 3D Photography

Sep 01, 2020
Johannes Kopf, Kevin Matzen, Suhib Alsisan, Ocean Quigley, Francis Ge, Yangming Chong, Josh Patterson, Jan-Michael Frahm, Shu Wu, Matthew Yu, Peizhao Zhang, Zijian He, Peter Vajda, Ayush Saraf, Michael Cohen

3D photography is a new medium that allows viewers to more fully experience a captured moment. In this work, we refer to a 3D photo as one that displays parallax induced by moving the viewpoint (as opposed to a stereo pair with a fixed viewpoint). 3D photos are static in time, like traditional photos, but are displayed with interactive parallax on mobile or desktop screens, as well as on Virtual Reality devices, where viewing it also includes stereo. We present an end-to-end system for creating and viewing 3D photos, and the algorithmic and design choices therein. Our 3D photos are captured in a single shot and processed directly on a mobile device. The method starts by estimating depth from the 2D input image using a new monocular depth estimation network that is optimized for mobile devices. It performs competitively to the state-of-the-art, but has lower latency and peak memory consumption and uses an order of magnitude fewer parameters. The resulting depth is lifted to a layered depth image, and new geometry is synthesized in parallax regions. We synthesize color texture and structures in the parallax regions as well, using an inpainting network, also optimized for mobile devices, on the LDI directly. Finally, we convert the result into a mesh-based representation that can be efficiently transmitted and rendered even on low-end devices and over poor network connections. Altogether, the processing takes just a few seconds on a mobile device, and the result can be instantly viewed and shared. We perform extensive quantitative evaluation to validate our system and compare its new components against the current state-of-the-art.

* ACM Transactions on Graphics (Proceedings of SIGGRAPH 2020), Volume 39, Number 4, 2020 
* Project page: https://facebookresearch.github.io/one_shot_3d_photography/ Code: https://github.com/facebookresearch/one_shot_3d_photography 
  

Improving the Authentication with Built-in Camera Protocol Using Built-in Motion Sensors: A Deep Learning Solution

Jul 27, 2021
Cezara Benegui, Radu Tudor Ionescu

We propose an enhanced version of the Authentication with Built-in Camera (ABC) protocol by employing a deep learning solution based on built-in motion sensors. The standard ABC protocol identifies mobile devices based on the photo-response non-uniformity (PRNU) of the camera sensor, while also considering QR-code-based meta-information. During authentication, the user is required to take two photos that contain two QR codes presented on a screen. The presented QR code images also contain a unique probe signal, similar to a camera fingerprint, generated by the protocol. During verification, the server computes the fingerprint of the received photos and authenticates the user if (i) the probe signal is present, (ii) the metadata embedded in the QR codes is correct and (iii) the camera fingerprint is identified correctly. However, the protocol is vulnerable to forgery attacks when the attacker can compute the camera fingerprint from external photos, as shown in our preliminary work. In this context, we propose an enhancement for the ABC protocol based on motion sensor data, as an additional and passive authentication layer. Smartphones can be identified through their motion sensor data, which, unlike photos, is never posted by users on social media platforms, thus being more secure than using photographs alone. To this end, we transform motion signals into embedding vectors produced by deep neural networks, applying Support Vector Machines for the smartphone identification task. Our change to the ABC protocol results in a multi-modal protocol that lowers the false acceptance rate for the attack proposed in our previous work to a percentage as low as 0.07%.

* Accepted for publication in Mathematics 
  

Improving the Authentication with Built-in Camera ProtocolUsing Built-in Motion Sensors: A Deep Learning Solution

Jul 22, 2021
Cezara Benegui, Radu Tudor Ionescu

We propose an enhanced version of the Authentication with Built-in Camera (ABC) protocol by employing a deep learning solution based on built-in motion sensors. The standard ABC protocol identifies mobile devices based on the photo-response non-uniformity (PRNU) of the camera sensor, while also considering QR-code-based meta-information. During authentication, the user is required to take two photos that contain two QR codes presented on a screen. The presented QR code images also contain a unique probe signal, similar to a camera fingerprint, generated by the protocol. During verification, the server computes the fingerprint of the received photos and authenticates the user if (i) the probe signal is present, (ii) the metadata embedded in the QR codes is correct and (iii) the camera fingerprint is identified correctly. However, the protocol is vulnerable to forgery attacks when the attacker can compute the camera fingerprint from external photos, as shown in our preliminary work. In this context, we propose an enhancement for the ABC protocol based on motion sensor data, as an additional and passive authentication layer. Smartphones can be identified through their motion sensor data, which, unlike photos, is never posted by users on social media platforms, thus being more secure than using photographs alone. To this end, we transform motion signals into embedding vectors produced by deep neural networks, applying Support Vector Machines for the smartphone identification task. Our change to the ABC protocol results in a multi-modal protocol that lowers the false acceptance rate for the attack proposed in our previous work to a percentage as low as 0.07%.

  
<<
8
9
10
11
12
13
14
15
16
17
18
19
20
>>