Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexei A. Efros

Self-Supervised Policy Adaptation during Deployment

Jul 08, 2020

Nicklas Hansen, Yu Sun, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

Figure 1 for Self-Supervised Policy Adaptation during Deployment

Figure 2 for Self-Supervised Policy Adaptation during Deployment

Figure 3 for Self-Supervised Policy Adaptation during Deployment

Figure 4 for Self-Supervised Policy Adaptation during Deployment

Abstract:In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse environments from DeepMind Control suite and ViZDoom. Our method improves generalization in 25 out of 30 environments across various tasks, and outperforms domain randomization on a majority of environments.

* Project page: https://nicklashansen.github.io/PAD/ , Code: https://github.com/nicklashansen/policy-adaptation-during-deployment

Via

Access Paper or Ask Questions

Swapping Autoencoder for Deep Image Manipulation

Jul 01, 2020

Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang

Figure 1 for Swapping Autoencoder for Deep Image Manipulation

Figure 2 for Swapping Autoencoder for Deep Image Manipulation

Figure 3 for Swapping Autoencoder for Deep Image Manipulation

Figure 4 for Swapping Autoencoder for Deep Image Manipulation

Abstract:Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.

Via

Access Paper or Ask Questions

Space-Time Correspondence as a Contrastive Random Walk

Jun 25, 2020

Allan Jabri, Andrew Owens, Alexei A. Efros

Figure 1 for Space-Time Correspondence as a Contrastive Random Walk

Figure 2 for Space-Time Correspondence as a Contrastive Random Walk

Figure 3 for Space-Time Correspondence as a Contrastive Random Walk

Figure 4 for Space-Time Correspondence as a Contrastive Random Walk

Abstract:This paper proposes a simple self-supervised approach for learning representations for visual correspondence from raw video. We cast correspondence as link prediction in a space-time graph constructed from a video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a node embedding in which pairwise similarity defines transition probabilities of a random walk. Prediction of long-range correspondence is efficiently computed as a walk along this graph. The embedding learns to guide the walk by placing high probability along paths of correspondence. Targets are formed without supervision, by cycle-consistency: we train the embedding to maximize the likelihood of returning to the initial node when walking along a graph constructed from a `palindrome' of frames. We demonstrate that the approach allows for learning representations from large unlabeled video. Despite its simplicity, the method outperforms the self-supervised state-of-the-art on a variety of label propagation tasks involving objects, semantic parts, and pose. Moreover, we show that self-supervised adaptation at test-time and edge dropout improve transfer for object-level correspondence.

Via

Access Paper or Ask Questions

RANSAC-Flow: generic two-stage image alignment

Apr 03, 2020

Xi Shen, François Darmon, Alexei A. Efros, Mathieu Aubry

Figure 1 for RANSAC-Flow: generic two-stage image alignment

Figure 2 for RANSAC-Flow: generic two-stage image alignment

Figure 3 for RANSAC-Flow: generic two-stage image alignment

Figure 4 for RANSAC-Flow: generic two-stage image alignment

Abstract:This paper considers the generic problem of dense alignment between two images, whether they be two frames of a video, two widely different views of a scene, two paintings depicting similar content, etc. Whereas each such task is typically addressed with a domain-specific solution, we show that a simple unsupervised approach performs surprisingly well across a range of tasks. Our main insight is that parametric and non-parametric alignment methods have complementary strengths. We propose a two-stage process: first, a feature-based parametric coarse alignment using one or more homographies, followed by non-parametric fine pixel-wise alignment. Coarse alignment is performed using RANSAC on off-the-shelf deep features. Fine alignment is learned in an unsupervised way by a deep network which optimizes a standard structural similarity metric (SSIM) between the two images, plus cycle-consistency. Despite its simplicity, our method shows competitive results on a range of tasks and datasets, including unsupervised optical flow on KITTI, dense correspondences on Hpatches, two-view geometry estimation on YFCC100M, localization on Aachen Day-Night, and, for the first time, fine alignment of artworks on the Brughel dataset. Our code and data are available at http://imagine.enpc.fr/~shenx/RANSAC-Flow/

* Project webpage: http://imagine.enpc.fr/~shenx/RANSAC-Flow/

Via

Access Paper or Ask Questions

CNN-generated images are surprisingly easy to spot for now

Dec 23, 2019

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, Alexei A. Efros

Figure 1 for CNN-generated images are surprisingly easy to spot for now

Figure 2 for CNN-generated images are surprisingly easy to spot for now

Figure 3 for CNN-generated images are surprisingly easy to spot for now

Figure 4 for CNN-generated images are surprisingly easy to spot for now

Abstract:In this work we ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used. To test this, we collect a dataset consisting of fake images generated by 11 different CNN-based image generator models, chosen to span the space of commonly used architectures today (ProGAN, StyleGAN, BigGAN, CycleGAN, StarGAN, GauGAN, DeepFakes, cascaded refinement networks, implicit maximum likelihood estimation, second-order attention super-resolution, seeing-in-the-dark). We demonstrate that, with careful pre- and post-processing and data augmentation, a standard image classifier trained on only one specific CNN generator (ProGAN) is able to generalize surprisingly well to unseen architectures, datasets, and training methods (including the just released StyleGAN2). Our findings suggest the intriguing possibility that today's CNN-generated images share some common systematic flaws, preventing them from achieving realistic image synthesis.

Via

Access Paper or Ask Questions

Test-Time Training for Out-of-Distribution Generalization

Oct 25, 2019

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A. Efros, Moritz Hardt

Figure 1 for Test-Time Training for Out-of-Distribution Generalization

Figure 2 for Test-Time Training for Out-of-Distribution Generalization

Figure 3 for Test-Time Training for Out-of-Distribution Generalization

Figure 4 for Test-Time Training for Out-of-Distribution Generalization

Abstract:We introduce a general approach, called test-time training, for improving the performance of predictive models when test and training data come from different distributions. Test-time training turns a single unlabeled test instance into a self-supervised learning problem, on which we update the model parameters before making a prediction on this instance. We show that this simple idea leads to surprising improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts. Theoretical investigations on a convex model reveal helpful intuitions for when we can expect our approach to help.

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptation through Self-Supervision

Sep 29, 2019

Yu Sun, Eric Tzeng, Trevor Darrell, Alexei A. Efros

Figure 1 for Unsupervised Domain Adaptation through Self-Supervision

Figure 2 for Unsupervised Domain Adaptation through Self-Supervision

Figure 3 for Unsupervised Domain Adaptation through Self-Supervision

Figure 4 for Unsupervised Domain Adaptation through Self-Supervision

Abstract:This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data. Like much of previous work, we seek to align the learned representations of the source and target domains while preserving discriminability. The way we accomplish alignment is by learning to perform auxiliary self-supervised task(s) on both domains simultaneously. Each self-supervised task brings the two domains closer together along the direction relevant to that task. Training this jointly with the main task classifier on the source domain is shown to successfully generalize to the unlabeled target domain. The presented objective is straightforward to implement and easy to optimize. We achieve state-of-the-art results on four out of seven standard benchmarks, and competitive results on segmentation adaptation. We also demonstrate that our method composes well with another popular pixel-level adaptation method.

Via

Access Paper or Ask Questions

Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

Sep 25, 2019

Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, Eli Shechtman

Figure 1 for Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

Figure 2 for Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

Figure 3 for Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

Figure 4 for Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

Abstract:We propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized image to the user. This enables a feedback loop, where the user can edit their sketch based on the network's recommendations, visualizing both the completed shape and final rendered image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network. Video available at our website: https://arnabgho.github.io/iSketchNFill/.

* ICCV 2019, Video Avaiable at https://youtu.be/T9xtpAMUDps

Via

Access Paper or Ask Questions

Detecting Photoshopped Faces by Scripting Photoshop

Jun 13, 2019

Sheng-Yu Wang, Oliver Wang, Andrew Owens, Richard Zhang, Alexei A. Efros

Figure 1 for Detecting Photoshopped Faces by Scripting Photoshop

Figure 2 for Detecting Photoshopped Faces by Scripting Photoshop

Figure 3 for Detecting Photoshopped Faces by Scripting Photoshop

Figure 4 for Detecting Photoshopped Faces by Scripting Photoshop

Abstract:Most malicious photo manipulations are created using standard image editing tools, such as Adobe Photoshop. We present a method for detecting one very popular Photoshop manipulation -- image warping applied to human faces -- using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself. We show that our model outperforms humans at the task of recognizing manipulated images, can predict the specific location of edits, and in some cases can be used to "undo" a manipulation to reconstruct the original, unedited image. We demonstrate that the system can be successfully applied to real, artist-created image manipulations.

Via

Access Paper or Ask Questions

Learning Correspondence from the Cycle-Consistency of Time

Apr 02, 2019

Xiaolong Wang, Allan Jabri, Alexei A. Efros

Figure 1 for Learning Correspondence from the Cycle-Consistency of Time

Figure 2 for Learning Correspondence from the Cycle-Consistency of Time

Figure 3 for Learning Correspondence from the Cycle-Consistency of Time

Figure 4 for Learning Correspondence from the Cycle-Consistency of Time

Abstract:We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

* CVPR 2019 Oral. Project page: http://ajabri.github.io/timecycle

Via

Access Paper or Ask Questions