Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

One Shot 3D Photography

Aug 27, 2020
Johannes Kopf, Kevin Matzen, Suhib Alsisan, Ocean Quigley, Francis Ge, Yangming Chong, Josh Patterson, Jan-Michael Frahm, Shu Wu, Matthew Yu, Peizhao Zhang, Peter Vajda, Ayush Saraf, Michael Cohen

3D photography is a new medium that allows viewers to more fully experience a captured moment. In this work, we refer to a 3D photo as one that displays parallax induced by moving the viewpoint (as opposed to a stereo pair with a fixed viewpoint). 3D photos are static in time, like traditional photos, but are displayed with interactive parallax on mobile or desktop screens, as well as on Virtual Reality devices, where viewing it also includes stereo. We present an end-to-end system for creating and viewing 3D photos, and the algorithmic and design choices therein. Our 3D photos are captured in a single shot and processed directly on a mobile device. The method starts by estimating depth from the 2D input image using a new monocular depth estimation network that is optimized for mobile devices. It performs competitively to the state-of-the-art, but has lower latency and peak memory consumption and uses an order of magnitude fewer parameters. The resulting depth is lifted to a layered depth image, and new geometry is synthesized in parallax regions. We synthesize color texture and structures in the parallax regions as well, using an inpainting network, also optimized for mobile devices, on the LDI directly. Finally, we convert the result into a mesh-based representation that can be efficiently transmitted and rendered even on low-end devices and over poor network connections. Altogether, the processing takes just a few seconds on a mobile device, and the result can be instantly viewed and shared. We perform extensive quantitative evaluation to validate our system and compare its new components against the current state-of-the-art.

* ACM Transactions on Graphics (Proceedings of SIGGRAPH 2020), Volume 39, Number 4, 2020 
* Project page: https://facebookresearch.github.io/one_shot_3d_photography/ Code: https://github.com/facebookresearch/one_shot_3d_photography 
  

Improving the Authentication with Built-in Camera Protocol Using Built-in Motion Sensors: A Deep Learning Solution

Jul 27, 2021
Cezara Benegui, Radu Tudor Ionescu

We propose an enhanced version of the Authentication with Built-in Camera (ABC) protocol by employing a deep learning solution based on built-in motion sensors. The standard ABC protocol identifies mobile devices based on the photo-response non-uniformity (PRNU) of the camera sensor, while also considering QR-code-based meta-information. During authentication, the user is required to take two photos that contain two QR codes presented on a screen. The presented QR code images also contain a unique probe signal, similar to a camera fingerprint, generated by the protocol. During verification, the server computes the fingerprint of the received photos and authenticates the user if (i) the probe signal is present, (ii) the metadata embedded in the QR codes is correct and (iii) the camera fingerprint is identified correctly. However, the protocol is vulnerable to forgery attacks when the attacker can compute the camera fingerprint from external photos, as shown in our preliminary work. In this context, we propose an enhancement for the ABC protocol based on motion sensor data, as an additional and passive authentication layer. Smartphones can be identified through their motion sensor data, which, unlike photos, is never posted by users on social media platforms, thus being more secure than using photographs alone. To this end, we transform motion signals into embedding vectors produced by deep neural networks, applying Support Vector Machines for the smartphone identification task. Our change to the ABC protocol results in a multi-modal protocol that lowers the false acceptance rate for the attack proposed in our previous work to a percentage as low as 0.07%.

* Accepted for publication in Mathematics 
  

Improving the Authentication with Built-in Camera ProtocolUsing Built-in Motion Sensors: A Deep Learning Solution

Jul 22, 2021
Cezara Benegui, Radu Tudor Ionescu

We propose an enhanced version of the Authentication with Built-in Camera (ABC) protocol by employing a deep learning solution based on built-in motion sensors. The standard ABC protocol identifies mobile devices based on the photo-response non-uniformity (PRNU) of the camera sensor, while also considering QR-code-based meta-information. During authentication, the user is required to take two photos that contain two QR codes presented on a screen. The presented QR code images also contain a unique probe signal, similar to a camera fingerprint, generated by the protocol. During verification, the server computes the fingerprint of the received photos and authenticates the user if (i) the probe signal is present, (ii) the metadata embedded in the QR codes is correct and (iii) the camera fingerprint is identified correctly. However, the protocol is vulnerable to forgery attacks when the attacker can compute the camera fingerprint from external photos, as shown in our preliminary work. In this context, we propose an enhancement for the ABC protocol based on motion sensor data, as an additional and passive authentication layer. Smartphones can be identified through their motion sensor data, which, unlike photos, is never posted by users on social media platforms, thus being more secure than using photographs alone. To this end, we transform motion signals into embedding vectors produced by deep neural networks, applying Support Vector Machines for the smartphone identification task. Our change to the ABC protocol results in a multi-modal protocol that lowers the false acceptance rate for the attack proposed in our previous work to a percentage as low as 0.07%.

  

Face Search at Scale: 80 Million Gallery

Jul 28, 2015
Dayong Wang, Charles Otto, Anil K. Jain

Due to the prevalence of social media websites, one challenge facing computer vision researchers is to devise methods to process and search for persons of interest among the billions of shared photos on these websites. Facebook revealed in a 2013 white paper that its users have uploaded more than 250 billion photos, and are uploading 350 million new photos each day. Due to this humongous amount of data, large-scale face search for mining web images is both important and challenging. Despite significant progress in face recognition, searching a large collection of unconstrained face images has not been adequately addressed. To address this challenge, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top-k most similar faces using deep features generated from a convolutional neural network. The k candidates are re-ranked by combining similarities from deep features and the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that the deep features are competitive with state-of-the-art methods on unconstrained face recognition benchmarks (LFW and IJB-A). Further, the proposed face search system offers an excellent trade-off between accuracy and scalability on datasets consisting of millions of images. Additionally, in an experiment involving searching for face images of the Tsarnaev brothers, convicted of the Boston Marathon bombing, the proposed face search system could find the younger brother's (Dzhokhar Tsarnaev) photo at rank 1 in 1 second on a 5M gallery and at rank 8 in 7 seconds on an 80M gallery.

* 14 pages, 16 figures 
  

Fast Preprocessing for Robust Face Sketch Synthesis

Aug 01, 2017
Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang

Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos. The critical step causing the failure is the search of similar patch candidates for an input photo patch. Conventional illumination invariant patch distances are adopted rather than directly relying on pixel intensity difference, but they will fail when local contrast within a patch changes. In this paper, we propose a fast preprocessing method named Bidirectional Luminance Remapping (BLR), which interactively adjust the lighting of training and input photos. Our method can be directly integrated into state-of-the-art exemplar-based methods to improve their robustness with ignorable computational cost.

* IJCAI 2017. Project page: http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sketch/index.html 
  

User Preference Prediction in Visual Data on Mobile Devices

Jul 10, 2019
A. V. Savchenko, K. V. Demochkin, I. S. Grechikhin

In this paper we consider the user modeling given the photos and videos from the gallery on a mobile device. We propose the novel user preference prediction engine based on scene understanding, object detection and face recognition. At first, all faces in a gallery are clustered and all private photos and videos with faces from large clusters are processed on the embedded system in offline mode. Other photos are sent to the remote server to be analyzed by very deep models. The visual features of each photo are aggregated into a single user descriptor using the neural attention block. The proposed pipeline is implemented for the Android mobile platform. Experimental results with a subset of Amazon Home and Kitchen, Places2 and Open Images datasets demonstrate the possibility to process images very efficiently without accuracy degradation.

* 5 pages; 2 figures 
  

Detecting People in Artwork with CNNs

Oct 27, 2016
Nicholas Westlake, Hongping Cai, Peter Hall

CNNs have massively improved performance in object detection in photographs. However research into object detection in artwork remains limited. We show state-of-the-art performance on a challenging dataset, People-Art, which contains people from photos, cartoons and 41 different artwork movements. We achieve this high performance by fine-tuning a CNN for this task, thus also demonstrating that training CNNs on photos results in overfitting for photos: only the first three or four layers transfer from photos to artwork. Although the CNN's performance is the highest yet, it remains less than 60\% AP, suggesting further work is needed for the cross-depiction problem. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46604-0_57

* 14 pages, plus 3 pages of references; 7 figures in ECCV 2016 Workshops 
  

Cross-Domain Object Matching with Model Selection

Dec 07, 2010
Makoto Yamada, Masashi Sugiyama

The goal of cross-domain object matching (CDOM) is to find correspondence between two sets of objects in different domains in an unsupervised way. Photo album summarization is a typical application of CDOM, where photos are automatically aligned into a designed frame expressed in the Cartesian coordinate system. CDOM is usually formulated as finding a mapping from objects in one domain (photos) to objects in the other domain (frame) so that the pairwise dependency is maximized. A state-of-the-art CDOM method employs a kernel-based dependency measure, but it has a drawback that the kernel parameter needs to be determined manually. In this paper, we propose alternative CDOM methods that can naturally address the model selection problem. Through experiments on image matching, unpaired voice conversion, and photo album summarization tasks, the effectiveness of the proposed methods is demonstrated.

  

Zoom-RNN: A Novel Method for Person Recognition Using Recurrent Neural Networks

Sep 26, 2018
Sina Mokhtarzadeh Azar, Sajjad Azami, Mina Ghadimi Atigh, Mohammad Javadi, Ahmad Nickabadi

The overwhelming popularity of social media has resulted in bulk amounts of personal photos being uploaded to the internet every day. Since these photos are taken in unconstrained settings, recognizing the identities of people among the photos remains a challenge. Studies have indicated that utilizing evidence other than face appearance improves the performance of person recognition systems. In this work, we aim to take advantage of additional cues obtained from different body regions in a zooming in fashion for person recognition. Hence, we present Zoom-RNN, a novel method based on recurrent neural networks for combining evidence extracted from the whole body, upper body, and head regions. Our model is evaluated on a challenging dataset, namely People In Photo Albums (PIPA), and we demonstrate that employing our system improves the performance of conventional fusion methods by a noticeable margin.

* 8 pages, 5 figures 
  

Happy Travelers Take Big Pictures: A Psychological Study with Machine Learning and Big Data

Sep 22, 2017
Xuefeng Liang, Lixin Fan, Yuen Peng Loh, Yang Liu, Song Tong

In psychology, theory-driven researches are usually conducted with extensive laboratory experiments, yet rarely tested or disproved with big data. In this paper, we make use of 418K travel photos with traveler ratings to test the influential "broaden-and-build" theory, that suggests positive emotions broaden one's visual attention. The core hypothesis examined in this study is that positive emotion is associated with a wider attention, hence highly-rated sites would trigger wide-angle photographs. By analyzing travel photos, we find a strong correlation between a preference for wide-angle photos and the high rating of tourist sites on TripAdvisor. We are able to carry out this analysis through the use of deep learning algorithms to classify the photos into wide and narrow angles, and present this study as an exemplar of how big data and deep learning can be used to test laboratory findings in the wild.

  
<<
9
10
11
12
13
14
15
16
17
18
19
20
21
>>