Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ira Kemelmacher-Shlizerman

Inverse Painting: Reconstructing The Painting Process

Sep 30, 2024

Bowei Chen, Yifan Wang, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz

Figure 1 for Inverse Painting: Reconstructing The Painting Process

Figure 2 for Inverse Painting: Reconstructing The Painting Process

Figure 3 for Inverse Painting: Reconstructing The Painting Process

Figure 4 for Inverse Painting: Reconstructing The Painting Process

Abstract:Given an input painting, we reconstruct a time-lapse video of how it may have been painted. We formulate this as an autoregressive image generation problem, in which an initially blank "canvas" is iteratively updated. The model learns from real artists by training on many painting videos. Our approach incorporates text and region understanding to define a set of painting "instructions" and updates the canvas with a novel diffusion-based renderer. The method extrapolates beyond the limited, acrylic style paintings on which it has been trained, showing plausible results for a wide range of artistic styles and genres.

* Project Page: https://inversepainting.github.io

Via

Access Paper or Ask Questions

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Aug 27, 2024

Xiaojuan Wang, Boyang Zhou, Brian Curless, Ira Kemelmacher-Shlizerman, Aleksander Holynski, Steven M. Seitz

Figure 1 for Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Figure 2 for Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Figure 3 for Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Figure 4 for Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Abstract:We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We accomplish this adaptation through a lightweight fine-tuning technique that produces a version of the model that instead predicts videos moving backwards in time from a single input image. This model (along with the original forward-moving model) is subsequently used in a dual-directional diffusion sampling process that combines the overlapping model estimates starting from each of the two keyframes. Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques.

* project page: https://svd-keyframe-interpolation.github.io/

Via

Access Paper or Ask Questions

M&M VTO: Multi-Garment Virtual Try-On and Editing

Jun 06, 2024

Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman

Figure 1 for M&M VTO: Multi-Garment Virtual Try-On and Editing

Figure 2 for M&M VTO: Multi-Garment Virtual Try-On and Editing

Figure 3 for M&M VTO: Multi-Garment Virtual Try-On and Editing

Figure 4 for M&M VTO: Multi-Garment Virtual Try-On and Editing

Abstract:We present M&M VTO, a mix and match virtual try-on method that takes as input multiple garment images, text description for garment layout and an image of a person. An example input includes: an image of a shirt, an image of a pair of pants, "rolled sleeves, shirt tucked in", and an image of a person. The output is a visualization of how those garments (in the desired layout) would look like on the given person. Key contributions of our method are: 1) a single stage diffusion based model, with no super resolution cascading, that allows to mix and match multiple garments at 1024x512 resolution preserving and warping intricate garment details, 2) architecture design (VTO UNet Diffusion Transformer) to disentangle denoising from person specific features, allowing for a highly effective finetuning strategy for identity preservation (6MB model per individual vs 4GB achieved with, e.g., dreambooth finetuning); solving a common identity loss problem in current virtual try-on methods, 3) layout control for multiple garments via text inputs specifically finetuned over PaLI-3 for virtual try-on task. Experimental results indicate that M&M VTO achieves state-of-the-art performance both qualitatively and quantitatively, as well as opens up new opportunities for virtual try-on via language-guided and multi-garment try-on.

* CVPR 2024 Highlight. Project website: https://mmvto.github.io/

Via

Access Paper or Ask Questions

Don't Look at the Camera: Achieving Perceived Eye Contact

Apr 26, 2024

Alice Gao, Samyukta Jayakumar, Marcello Maniglia, Brian Curless, Ira Kemelmacher-Shlizerman, Aaron R. Seitz, Steven M. Seitz

Figure 1 for Don't Look at the Camera: Achieving Perceived Eye Contact

Figure 2 for Don't Look at the Camera: Achieving Perceived Eye Contact

Figure 3 for Don't Look at the Camera: Achieving Perceived Eye Contact

Figure 4 for Don't Look at the Camera: Achieving Perceived Eye Contact

Abstract:We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We quantitatively assess where subjects should direct their gaze relative to a camera lens to optimize the perception that they are making eye contact.

Via

Access Paper or Ask Questions

HRTF Estimation in the Wild

Nov 06, 2023

Vivek Jayaram, Ira Kemelmacher-Shlizerman, Steven M. Seitz

Figure 1 for HRTF Estimation in the Wild

Figure 2 for HRTF Estimation in the Wild

Figure 3 for HRTF Estimation in the Wild

Figure 4 for HRTF Estimation in the Wild

Abstract:Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs differ significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. We imagine a world where your personalized HRTF can be determined by capturing data through earbuds in everyday environments. In this paper, we propose a novel approach for deriving personalized HRTFs that only relies on in-the-wild binaural recordings and head tracking data. By analyzing how sounds change as the user rotates their head through different environments with different noise sources, we can accurately estimate their personalized HRTF. Our results show that our predicted HRTFs closely match ground-truth HRTFs measured in an anechoic chamber. Furthermore, listening studies demonstrate that our personalized HRTFs significantly improve sound localization and reduce front-back confusion in virtual environments. Our approach offers an efficient and accessible method for deriving personalized HRTFs and has the potential to greatly improve spatial audio experiences.

* 9 Pages. Presented at UIST '23

Via

Access Paper or Ask Questions

Animating Street View

Oct 12, 2023

Mengyi Shan, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz

Abstract:We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing effects. The system achieves these by reconstructing the still image street scene, simulating crowd behavior, and rendering with consistent lighting, visibility, occlusions, and shadows. We demonstrate results on a diverse range of street scenes including regular still images and panoramas.

* SIGGRAPH Asia 2023 Conference Track

Via

Access Paper or Ask Questions

Total Selfie: Generating Full-Body Selfies

Aug 28, 2023

Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz

Figure 1 for Total Selfie: Generating Full-Body Selfies

Figure 2 for Total Selfie: Generating Full-Body Selfies

Figure 3 for Total Selfie: Generating Full-Body Selfies

Figure 4 for Total Selfie: Generating Full-Body Selfies

Abstract:We present a method to generate full-body selfies -- photos that you take of yourself, but capturing your whole body as if someone else took the photo of you from a few feet away. Our approach takes as input a pre-captured video of your body, a target pose photo, and a selfie + background pair for each location. We introduce a novel diffusion-based approach to combine all of this information into high quality, well-composed photos of you with the desired pose and background.

* Project page: https://homes.cs.washington.edu/~boweiche/project_page/totalselfie/

Via

Access Paper or Ask Questions

TryOnDiffusion: A Tale of Two UNets

Jun 14, 2023

Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

Figure 1 for TryOnDiffusion: A Tale of Two UNets

Figure 2 for TryOnDiffusion: A Tale of Two UNets

Figure 3 for TryOnDiffusion: A Tale of Two UNets

Figure 4 for TryOnDiffusion: A Tale of Two UNets

Abstract:Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

* CVPR 2023. Project page: https://tryondiffusion.github.io/

Via

Access Paper or Ask Questions

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

May 04, 2023

Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman

Abstract:We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page.

* Project page: https://grail.cs.washington.edu/projects/dreampose/

Via

Access Paper or Ask Questions

PersonNeRF: Personalized Reconstruction from Photo Collections

Feb 16, 2023

Chung-Yi Weng, Pratul P. Srinivasan, Brian Curless, Ira Kemelmacher-Shlizerman

Figure 1 for PersonNeRF: Personalized Reconstruction from Photo Collections

Figure 2 for PersonNeRF: Personalized Reconstruction from Photo Collections

Figure 3 for PersonNeRF: Personalized Reconstruction from Photo Collections

Figure 4 for PersonNeRF: Personalized Reconstruction from Photo Collections

Abstract:We present PersonNeRF, a method that takes a collection of photos of a subject (e.g. Roger Federer) captured across multiple years with arbitrary body poses and appearances, and enables rendering the subject with arbitrary novel combinations of viewpoint, body pose, and appearance. PersonNeRF builds a customized neural volumetric 3D model of the subject that is able to render an entire space spanned by camera viewpoint, body pose, and appearance. A central challenge in this task is dealing with sparse observations; a given body pose is likely only observed by a single viewpoint with a single appearance, and a given appearance is only observed under a handful of different body poses. We address this issue by recovering a canonical T-pose neural volumetric representation of the subject that allows for changing appearance across different observations, but uses a shared pose-dependent motion field across all observations. We demonstrate that this approach, along with regularization of the recovered volumetric geometry to encourage smoothness, is able to recover a model that renders compelling images from novel combinations of viewpoint, pose, and appearance from these challenging unstructured photo collections, outperforming prior work for free-viewpoint human rendering.

* Project Page: https://grail.cs.washington.edu/projects/personnerf/

Via

Access Paper or Ask Questions