Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Neural Animation and Reenactment of Human Actor Videos

Sep 11, 2018
Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Hyeongwoo Kim, Florian Bernard, Marc Habermann, Wenping Wang, Christian Theobalt

We propose a method for generating (near) video-realistic animations of real humans under user control. In contrast to conventional human character rendering, we do not require the availability of a production-quality photo-realistic 3D model of the human, but instead rely on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person. With that, our approach significantly reduces production cost compared to conventional rendering approaches based on production-quality 3D models, and can also be used to realistically edit existing videos. Technically, this is achieved by training a neural network that translates simple synthetic images of a human character into realistic imagery. For training our networks, we first track the 3D motion of the person in the video using the template model, and subsequently generate a synthetically rendered version of the video. These images are then used to train a conditional generative adversarial network that translates synthetic images of the 3D model into realistic imagery of the human. We evaluate our method for the reenactment of another person that is tracked in order to obtain the motion data, and show video results generated from artist-designed skeleton motion. Our results outperform the state-of-the-art in learning-based human image synthesis. Project page: http://gvv.mpi-inf.mpg.de/projects/wxu/HumanReenactment/

* Project page: http://gvv.mpi-inf.mpg.de/projects/wxu/HumanReenactment/ 
  
Access Paper or Ask Questions

Age Progression/Regression by Conditional Adversarial Autoencoder

Mar 28, 2017
Zhifei Zhang, Yang Song, Hairong Qi

"If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5?" The answer is probably a "No." Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.

* Accepted by The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) 
  
Access Paper or Ask Questions

Reading Between the Pixels: Photographic Steganography for Camera Display Messaging

Apr 06, 2016
Eric Wengrowski, Kristin Dana, Marco Gruteser, Narayan Mandayam

We exploit human color metamers to send light-modulated messages less visible to the human eye, but recoverable by cameras. These messages are a key component to camera-display messaging, such as handheld smartphones capturing information from electronic signage. Each color pixel in the display image is modified by a particular color gradient vector. The challenge is to find the color gradient that maximizes camera response, while minimizing human response. The mismatch in human spectral and camera sensitivity curves creates an opportunity for hidden messaging. Our approach does not require knowledge of these sensitivity curves, instead we employ a data-driven method. We learn an ellipsoidal partitioning of the six-dimensional space of colors and color gradients. This partitioning creates metamer sets defined by the base color at the display pixel and the color gradient direction for message encoding. We sample from the resulting metamer sets to find color steps for each base color to embed a binary message into an arbitrary image with reduced visible artifacts. Unlike previous methods that rely on visually obtrusive intensity modulation, we embed with color so that the message is more hidden. Ordinary displays and cameras are used without the need for expensive LEDs or high speed devices. The primary contribution of this work is a framework to map the pixels in an arbitrary image to a metamer pair for steganographic photo messaging.

* 16 pages with references 8 tables and figures 
  
Access Paper or Ask Questions

Prompt-aligned Gradient for Prompt Tuning

May 30, 2022
Beier Zhu, Yulei Niu, Yucheng Han, Yue Wu, Hanwang Zhang

Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]". Therefore, prompt shows a great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the prompt-based similarity measure. However, we find a common failure that improper fine-tuning may not only undermine the prompt's inherent prediction for the task-related classes, but also for other classes in the VLM vocabulary. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompt. We present Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the "general direction", which is represented as the gradient of the KL loss of the pre-defined prompt prediction. Extensive experiments demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods. Codes are available at https://github.com/BeierZhu/Prompt-align.

  
Access Paper or Ask Questions

Captcha Attack: Turning Captchas Against Humanity

Jan 13, 2022
Mauro Conti, Luca Pajola, Pier Paolo Tricomi

Nowadays, people generate and share massive content on online platforms (e.g., social networks, blogs). In 2021, the 1.9 billion daily active Facebook users posted around 150 thousand photos every minute. Content moderators constantly monitor these online platforms to prevent the spreading of inappropriate content (e.g., hate speech, nudity images). Based on deep learning (DL) advances, Automatic Content Moderators (ACM) help human moderators handle high data volume. Despite their advantages, attackers can exploit weaknesses of DL components (e.g., preprocessing, model) to affect their performance. Therefore, an attacker can leverage such techniques to spread inappropriate content by evading ACM. In this work, we propose CAPtcha Attack (CAPA), an adversarial technique that allows users to spread inappropriate text online by evading ACM controls. CAPA, by generating custom textual CAPTCHAs, exploits ACM's careless design implementations and internal procedures vulnerabilities. We test our attack on real-world ACM, and the results confirm the ferocity of our simple yet effective attack, reaching up to a 100% evasion success in most cases. At the same time, we demonstrate the difficulties in designing CAPA mitigations, opening new challenges in CAPTCHAs research area.

* Currently under submission 
  
Access Paper or Ask Questions

Captcha Attack:Turning Captchas Against Humanity

Jan 11, 2022
Mauro Conti, Luca Pajola, Pier Paolo Tricomi

Nowadays, people generate and share massive content on online platforms (e.g., social networks, blogs). In 2021, the 1.9 billion daily active Facebook users posted around 150 thousand photos every minute. Content moderators constantly monitor these online platforms to prevent the spreading of inappropriate content (e.g., hate speech, nudity images). Based on deep learning (DL) advances, Automatic Content Moderators (ACM) help human moderators handle high data volume. Despite their advantages, attackers can exploit weaknesses of DL components (e.g., preprocessing, model) to affect their performance. Therefore, an attacker can leverage such techniques to spread inappropriate content by evading ACM. In this work, we propose CAPtcha Attack (CAPA), an adversarial technique that allows users to spread inappropriate text online by evading ACM controls. CAPA, by generating custom textual CAPTCHAs, exploits ACM's careless design implementations and internal procedures vulnerabilities. We test our attack on real-world ACM, and the results confirm the ferocity of our simple yet effective attack, reaching up to a 100% evasion success in most cases. At the same time, we demonstrate the difficulties in designing CAPA mitigations, opening new challenges in CAPTCHAs research area.

* Currently under submission 
  
Access Paper or Ask Questions

Stacked Deep Multi-Scale Hierarchical Network for Fast Bokeh Effect Rendering from a Single Image

May 15, 2021
Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Anil Kumar Tiwari

The Bokeh Effect is one of the most desirable effects in photography for rendering artistic and aesthetic photos. Usually, it requires a DSLR camera with different aperture and shutter settings and certain photography skills to generate this effect. In smartphones, computational methods and additional sensors are used to overcome the physical lens and sensor limitations to achieve such effect. Most of the existing methods utilized additional sensor's data or pretrained network for fine depth estimation of the scene and sometimes use portrait segmentation pretrained network module to segment salient objects in the image. Because of these reasons, networks have many parameters, become runtime intensive and unable to run in mid-range devices. In this paper, we used an end-to-end Deep Multi-Scale Hierarchical Network (DMSHN) model for direct Bokeh effect rendering of images captured from the monocular camera. To further improve the perceptual quality of such effect, a stacked model consisting of two DMSHN modules is also proposed. Our model does not rely on any pretrained network module for Monocular Depth Estimation or Saliency Detection, thus significantly reducing the size of model and run time. Stacked DMSHN achieves state-of-the-art results on a large scale EBB! dataset with around 6x less runtime compared to the current state-of-the-art model in processing HD quality images.

* Accepted to MAI workshop, CVPR 2021. Code and models: https://github.com/saikatdutta/Stacked_DMSHN_bokeh 
  
Access Paper or Ask Questions

Blending Generative Adversarial Image Synthesis with Rendering for Computer Graphics

Jul 31, 2020
Ekim Yurtsever, Dongfang Yang, Ibrahim Mert Koc, Keith A. Redmill

Conventional computer graphics pipelines require detailed 3D models, meshes, textures, and rendering engines to generate 2D images from 3D scenes. These processes are labor-intensive. We introduce Hybrid Neural Computer Graphics (HNCG) as an alternative. The contribution is a novel image formation strategy to reduce the 3D model and texture complexity of computer graphics pipelines. Our main idea is straightforward: Given a 3D scene, render only important objects of interest and use generative adversarial processes for synthesizing the rest of the image. To this end, we propose a novel image formation strategy to form 2D semantic images from 3D scenery consisting of simple object models without textures. These semantic images are then converted into photo-realistic RGB images with a state-of-the-art conditional Generative Adversarial Network (cGAN) based image synthesizer trained on real-world data. Meanwhile, objects of interest are rendered using a physics-based graphics engine. This is necessary as we want to have full control over the appearance of objects of interest. Finally, the partially-rendered and cGAN synthesized images are blended with a blending GAN. We show that the proposed framework outperforms conventional rendering with ablation and comparison studies. Semantic retention and Fr\'echet Inception Distance (FID) measurements were used as the main performance metrics.

  
Access Paper or Ask Questions

Security and Privacy Preserving Deep Learning

Jun 29, 2020
Saichethan Miriyala Reddy, Saisree Miriyala

Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users personal, highly sensitive data such as photos and voice recordings are kept indefinitely by the companies that collect it. Users can neither delete it nor restrict the purposes for which it is used. So, data privacy has been a very important concern for governments and companies these days. It gives rise to a very interesting challenge since on the one hand, we are pushing further and further for high-quality models and accessible data, but on the other hand, we need to keep data safe from both intentional and accidental leakage. The more personal the data is it is more restricted it means some of the most important social issues cannot be addressed using machine learning because researchers do not have access to proper training data. But by learning how to machine learning that protects privacy we can make a huge difference in solving many social issues like curing disease etc. Deep neural networks are susceptible to various inference attacks as they remember information about their training data. In this chapter, we introduce differential privacy, which ensures that different kinds of statistical analyses dont compromise privacy and federated learning, training a machine learning model on a data to which we do not have access to.

  
Access Paper or Ask Questions
<<
>>