Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jia-Bin Huang

$\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus

Apr 06, 2023

Hadi Alzayer, Abdullah Abuolaim, Leung Chun Chan, Yang Yang, Ying Chen Lou, Jia-Bin Huang, Abhishek Kar

Abstract:Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifically, an ultra-wide camera with wider field of view and deeper DoF and a higher resolution primary camera with shallower DoF. In this work, we propose $\text{DC}^2$, a system for defocus control for synthetically varying camera aperture, focus distance and arbitrary defocus effects by fusing information from such a dual-camera system. Our key insight is to leverage real-world smartphone camera dataset by using image refocus as a proxy task for learning to control defocus. Quantitative and qualitative evaluations on real-world data demonstrate our system's efficacy where we outperform state-of-the-art on defocus deblurring, bokeh rendering, and image refocus. Finally, we demonstrate creative post-capture defocus control enabled by our method, including tilt-shift and content-based defocus effects.

* CVPR 2023. See the project page at https://defocus-control.github.io

Via

Access Paper or Ask Questions

Consistent View Synthesis with Pose-Guided Diffusion Models

Mar 30, 2023

Hung-Yu Tseng, Qinbo Li, Changil Kim, Suhib Alsisan, Jia-Bin Huang, Johannes Kopf

Abstract:Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches.

* CVPR 2023. Project page: https://poseguided-diffusion.github.io/

Via

Access Paper or Ask Questions

Progressively Optimized Local Radiance Fields for Robust View Synthesis

Mar 24, 2023

Andreas Meuleman, Yu-Lun Liu, Chen Gao, Jia-Bin Huang, Changil Kim, Min H. Kim, Johannes Kopf

Figure 1 for Progressively Optimized Local Radiance Fields for Robust View Synthesis

Figure 2 for Progressively Optimized Local Radiance Fields for Robust View Synthesis

Figure 3 for Progressively Optimized Local Radiance Fields for Robust View Synthesis

Figure 4 for Progressively Optimized Local Radiance Fields for Robust View Synthesis

Abstract:We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene. For handling unknown poses, we jointly estimate the camera poses with radiance field in a progressive manner. We show that progressive optimization significantly improves the robustness of the reconstruction. For handling large unbounded scenes, we dynamically allocate new local radiance fields trained with frames within a temporal window. This further improves robustness (e.g., performs well even under moderate pose drifts) and allows us to scale to large scenes. Our extensive evaluation on the Tanks and Temples dataset and our collected outdoor dataset, Static Hikes, show that our approach compares favorably with the state-of-the-art.

* CVPR 2023
* Project page: https://localrf.github.io/

Via

Access Paper or Ask Questions

DisCO: Portrait Distortion Correction with Perspective-Aware 3D GANs

Feb 23, 2023

Zhixiang Wang, Yu-Lun Liu, Jia-Bin Huang, Shin'ichi Satoh, Sizhuo Ma, Guru Krishnan, Jian Wang

Abstract:Close-up facial images captured at close distances often suffer from perspective distortion, resulting in exaggerated facial features and unnatural/unattractive appearances. We propose a simple yet effective method for correcting perspective distortions in a single close-up face. We first perform GAN inversion using a perspective-distorted input facial image by jointly optimizing the camera intrinsic/extrinsic parameters and face latent code. To address the ambiguity of joint optimization, we develop focal length reparametrization, optimization scheduling, and geometric regularization. Re-rendering the portrait at a proper focal length and camera distance effectively corrects these distortions and produces more natural-looking results. Our experiments show that our method compares favorably against previous approaches regarding visual quality. We showcase numerous examples validating the applicability of our method on portrait photos in the wild.

* Project website: https://portrait-disco.github.io/

Via

Access Paper or Ask Questions

Text-driven Visual Synthesis with Latent Diffusion Prior

Feb 16, 2023

Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar, Jia-Bin Huang

Figure 1 for Text-driven Visual Synthesis with Latent Diffusion Prior

Figure 2 for Text-driven Visual Synthesis with Latent Diffusion Prior

Figure 3 for Text-driven Visual Synthesis with Latent Diffusion Prior

Figure 4 for Text-driven Visual Synthesis with Latent Diffusion Prior

Abstract:There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines.

* Project website: https://latent-diffusion-prior.github.io/

Via

Access Paper or Ask Questions

In-N-Out: Face Video Inversion and Editing with Volumetric Decomposition

Feb 09, 2023

Yiran Xu, Zhixin Shu, Cameron Smith, Jia-Bin Huang, Seoung Wug Oh

Abstract:3D-aware GANs offer new capabilities for creative content editing, such as view synthesis, while preserving the editing capability of their 2D counterparts. Using GAN inversion, these methods can reconstruct an image or a video by optimizing/predicting a latent code and achieve semantic editing by manipulating the latent code. However, a model pre-trained on a face dataset (e.g., FFHQ) often has difficulty handling faces with out-of-distribution (OOD) objects, (e.g., heavy make-up or occlusions). We address this issue by explicitly modeling OOD objects in face videos. Our core idea is to represent the face in a video using two neural radiance fields, one for in-distribution and the other for out-of-distribution data, and compose them together for reconstruction. Such explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate our method's reconstruction accuracy and editability on challenging real videos and showcase favorable results against other baselines.

* Project page: https://in-n-out-3d.github.io/

Via

Access Paper or Ask Questions

Shape-aware Text-driven Layered Video Editing

Jan 30, 2023

Yao-Chih Lee, Ji-Ze Genevieve Jang, Yi-Ting Chen, Elizabeth Qiu, Jia-Bin Huang

Figure 1 for Shape-aware Text-driven Layered Video Editing

Figure 2 for Shape-aware Text-driven Layered Video Editing

Figure 3 for Shape-aware Text-driven Layered Video Editing

Figure 4 for Shape-aware Text-driven Layered Video Editing

Abstract:Temporal consistency is essential for video editing applications. Existing work on layered representation of videos allows propagating edits consistently to each frame. These methods, however, can only edit object appearance rather than object shape changes due to the limitation of using a fixed UV mapping field for texture atlas. We present a shape-aware, text-driven video editing method to tackle this challenge. To handle shape changes in video editing, we first propagate the deformation field between the input and edited keyframe to all frames. We then leverage a pre-trained text-conditioned diffusion model as guidance for refining shape distortion and completing unseen regions. The experimental results demonstrate that our method can achieve shape-aware consistent video editing and compare favorably with the state-of-the-art.

* Project page: https://text-video-edit.github.io/

Via

Access Paper or Ask Questions

HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling

Jan 05, 2023

Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollhoefer, Johannes Kopf, Matthew O'Toole, Changil Kim

Abstract:Volumetric scene representations enable photorealistic view synthesis for static scenes and form the basis of several existing 6-DoF video techniques. However, the volume rendering procedures that drive these representations necessitate careful trade-offs in terms of quality, rendering speed, and memory efficiency. In particular, existing methods fail to simultaneously achieve real-time performance, small memory footprint, and high-quality rendering for challenging real-world scenes. To address these issues, we present HyperReel -- a novel 6-DoF video representation. The two core components of HyperReel are: (1) a ray-conditioned sample prediction network that enables high-fidelity, high frame rate rendering at high resolutions and (2) a compact and memory efficient dynamic volume representation. Our 6-DoF video pipeline achieves the best performance compared to prior and contemporary approaches in terms of visual quality with small memory requirements, while also rendering at up to 18 frames-per-second at megapixel resolution without any custom CUDA code.

* Project page: https://hyperreel.github.io/

Via

Access Paper or Ask Questions

Robust Dynamic Radiance Fields

Jan 05, 2023

Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Johannes Kopf, Jia-Bin Huang

Figure 1 for Robust Dynamic Radiance Fields

Figure 2 for Robust Dynamic Radiance Fields

Figure 3 for Robust Dynamic Radiance Fields

Figure 4 for Robust Dynamic Radiance Fields

Abstract:Dynamic radiance field reconstruction methods aim to model the time-varying structure and appearance of a dynamic scene. Existing methods, however, assume that accurate camera poses can be reliably estimated by Structure from Motion (SfM) algorithms. These methods, thus, are unreliable as SfM algorithms often fail or produce erroneous poses on challenging videos with highly dynamic objects, poorly textured surfaces, and rotating camera motion. We address this robustness issue by jointly estimating the static and dynamic radiance fields along with the camera parameters (poses and focal length). We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods.

* Project page: https://robust-dynrf.github.io/

Via

Access Paper or Ask Questions

ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis

Nov 26, 2022

Yuan Li, Zhi-Hao Lin, David Forsyth, Jia-Bin Huang, Shenlong Wang

Figure 1 for ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis

Figure 2 for ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis

Figure 3 for ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis

Figure 4 for ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis

Abstract:Physical simulations produce excellent predictions of weather effects. Neural radiance fields produce SOTA scene models. We describe a novel NeRF-editing procedure that can fuse physical simulations with NeRF models of scenes, producing realistic movies of physical phenomena inthose scenes. Our application -- Climate NeRF -- allows people to visualize what climate change outcomes will do to them. ClimateNeRF allows us to render realistic weather effects, including smog, snow, and flood. Results can be controlled with physically meaningful variables like water level. Qualitative and quantitative studies show that our simulated results are significantly more realistic than those from state-of-the-art 2D image editing and 3D NeRF stylization.

* project page: https://climatenerf.github.io/

Via

Access Paper or Ask Questions