Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jia-Bin Huang

Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors

Dec 12, 2024

Yue Feng, Vaibhav Sanjay, Spencer Lutz, Badour AlBahar, Songwei Ge, Jia-Bin Huang

Figure 1 for Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors

Figure 2 for Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors

Figure 3 for Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors

Figure 4 for Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors

Abstract:Automatically generating multiview illusions is a compelling challenge, where a single piece of visual content offers distinct interpretations from different viewing perspectives. Traditional methods, such as shadow art and wire art, create interesting 3D illusions but are limited to simple visual outputs (i.e., figure-ground or line drawing), restricting their artistic expressiveness and practical versatility. Recent diffusion-based illusion generation methods can generate more intricate designs but are confined to 2D images. In this work, we present a simple yet effective approach for creating 3D multiview illusions based on user-provided text prompts or images. Our method leverages a pre-trained text-to-image diffusion model to optimize the textures and geometry of neural 3D representations through differentiable rendering. When viewed from multiple angles, this produces different interpretations. We develop several techniques to improve the quality of the generated 3D multiview illusions. We demonstrate the effectiveness of our approach through extensive experiments and showcase illusion generation with diverse 3D forms.

* Project page: https://3d-multiview-illusion.github.io/

Via

Access Paper or Ask Questions

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Nov 27, 2024

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi, Chen Gao, Tuotuo Li, Qinbo Li, Ayush Saraf, Jia-Bin Huang, Johannes Kopf, Gordon Wetzstein(+1 more)

Figure 1 for Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Figure 2 for Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Figure 3 for Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Figure 4 for Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Abstract:3D Gaussian Splatting (3DGS) has recently emerged as a state-of-the-art 3D reconstruction and rendering technique due to its high-quality results and fast training and rendering time. However, pixels covered by the same Gaussian are always shaded in the same color up to a Gaussian falloff scaling factor. Furthermore, the finest geometric detail any individual Gaussian can represent is a simple ellipsoid. These properties of 3DGS greatly limit the expressivity of individual Gaussian primitives. To address these issues, we draw inspiration from texture and alpha mapping in traditional graphics and integrate it with 3DGS. Specifically, we propose a new generalized Gaussian appearance representation that augments each Gaussian with alpha~(A), RGB, or RGBA texture maps to model spatially varying color and opacity across the extent of each Gaussian. As such, each Gaussian can represent a richer set of texture patterns and geometric structures, instead of just a single color and ellipsoid as in naive Gaussian Splatting. Surprisingly, we found that the expressivity of Gaussians can be greatly improved by using alpha-only texture maps, and further augmenting Gaussians with RGB texture maps achieves the highest expressivity. We validate our method on a wide variety of standard benchmark datasets and our own custom captures at both the object and scene levels. We demonstrate image quality improvements over existing methods while using a similar or lower number of Gaussians.

* Project website: https://textured-gaussians.github.io/

Via

Access Paper or Ask Questions

Generative Omnimatte: Learning to Decompose Video into Layers

Nov 25, 2024

Yao-Chih Lee, Erika Lu, Sarah Rumbley, Michal Geyer, Jia-Bin Huang, Tali Dekel, Forrester Cole

Figure 1 for Generative Omnimatte: Learning to Decompose Video into Layers

Figure 2 for Generative Omnimatte: Learning to Decompose Video into Layers

Figure 3 for Generative Omnimatte: Learning to Decompose Video into Layers

Figure 4 for Generative Omnimatte: Learning to Decompose Video into Layers

Abstract:Given a video and a set of input object masks, an omnimatte method aims to decompose the video into semantically meaningful layers containing individual objects along with their associated effects, such as shadows and reflections. Existing omnimatte methods assume a static background or accurate pose and depth estimation and produce poor decompositions when these assumptions are violated. Furthermore, due to the lack of generative prior on natural videos, existing methods cannot complete dynamic occluded regions. We present a novel generative layered video decomposition framework to address the omnimatte problem. Our method does not assume a stationary scene or require camera pose or depth information and produces clean, complete layers, including convincing completions of occluded dynamic regions. Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object. We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset, and demonstrate high-quality decompositions and editing results for a wide range of casually captured videos containing soft shadows, glossy reflections, splashing water, and more.

* Project page: https://gen-omnimatte.github.io/

Via

Access Paper or Ask Questions

Planar Reflection-Aware Neural Radiance Fields

Nov 07, 2024

Chen Gao, Yipeng Wang, Changil Kim, Jia-Bin Huang, Johannes Kopf

Figure 1 for Planar Reflection-Aware Neural Radiance Fields

Figure 2 for Planar Reflection-Aware Neural Radiance Fields

Figure 3 for Planar Reflection-Aware Neural Radiance Fields

Figure 4 for Planar Reflection-Aware Neural Radiance Fields

Abstract:Neural Radiance Fields (NeRF) have demonstrated exceptional capabilities in reconstructing complex scenes with high fidelity. However, NeRF's view dependency can only handle low-frequency reflections. It falls short when handling complex planar reflections, often interpreting them as erroneous scene geometries and leading to duplicated and inaccurate scene representations. To address this challenge, we introduce a reflection-aware NeRF that jointly models planar reflectors, such as windows, and explicitly casts reflected rays to capture the source of the high-frequency reflections. We query a single radiance field to render the primary color and the source of the reflection. We propose a sparse edge regularization to help utilize the true sources of reflections for rendering planar reflections rather than creating a duplicate along the primary ray at the same depth. As a result, we obtain accurate scene geometry. Rendering along the primary ray results in a clean, reflection-free view, while explicitly rendering along the reflected ray allows us to reconstruct highly detailed reflections. Our extensive quantitative and qualitative evaluations of real-world datasets demonstrate our method's enhanced performance in accurately handling reflections.

Via

Access Paper or Ask Questions

Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

Oct 03, 2024

Mingyang Xie, Haoming Cai, Sachin Shah, Yiran Xu, Brandon Y. Feng, Jia-Bin Huang, Christopher A. Metzler

Abstract:We introduce a simple yet effective approach for separating transmitted and reflected light. Our key insight is that the powerful novel view synthesis capabilities provided by modern inverse rendering methods (e.g.,~3D Gaussian splatting) allow one to perform flash/no-flash reflection separation using unpaired measurements -- this relaxation dramatically simplifies image acquisition over conventional paired flash/no-flash reflection separation methods. Through extensive real-world experiments, we demonstrate our method, Flash-Splat, accurately reconstructs both transmitted and reflected scenes in 3D. Our method outperforms existing 3D reflection separation methods, which do not leverage illumination control, by a large margin. Our project webpage is at https://flash-splat.github.io/.

Via

Access Paper or Ask Questions

Modeling Ambient Scene Dynamics for Free-view Synthesis

Jun 13, 2024

Meng-Li Shih, Jia-Bin Huang, Changil Kim, Rajvi Shah, Johannes Kopf, Chen Gao

Figure 1 for Modeling Ambient Scene Dynamics for Free-view Synthesis

Figure 2 for Modeling Ambient Scene Dynamics for Free-view Synthesis

Figure 3 for Modeling Ambient Scene Dynamics for Free-view Synthesis

Figure 4 for Modeling Ambient Scene Dynamics for Free-view Synthesis

Abstract:We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require multi-camera captures, and often fail to generalize to unseen motions, limiting their practical application. Our approach overcomes these constraints by leveraging the periodicity of ambient motions to learn the motion trajectory model, coupled with careful regularization. We also propose important practical strategies to improve the visual quality of the baseline 3DGS static reconstructions and to improve memory efficiency critical for GPU-memory intensive learning. We demonstrate high-quality photorealistic novel view synthesis of several ambient natural scenes with intricate textures and fine structural elements.

Via

Access Paper or Ask Questions

Rethinking Score Distillation as a Bridge Between Image Distributions

Jun 13, 2024

David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

Figure 1 for Rethinking Score Distillation as a Bridge Between Image Distributions

Figure 2 for Rethinking Score Distillation as a Bridge Between Image Distributions

Figure 3 for Rethinking Score Distillation as a Bridge Between Image Distributions

Figure 4 for Rethinking Score Distillation as a Bridge Between Image Distributions

Abstract:Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods' characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.

* Project webpage: https://sds-bridge.github.io/

Via

Access Paper or Ask Questions

Coherent Zero-Shot Visual Instruction Generation

Jun 06, 2024

Quynh Phung, Songwei Ge, Jia-Bin Huang

Figure 1 for Coherent Zero-Shot Visual Instruction Generation

Figure 2 for Coherent Zero-Shot Visual Instruction Generation

Figure 3 for Coherent Zero-Shot Visual Instruction Generation

Figure 4 for Coherent Zero-Shot Visual Instruction Generation

Abstract:Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language models (LLMs). Our approach systematically integrates text comprehension and image generation to ensure visual instructions are visually appealing and maintain consistency and accuracy throughout the instruction sequence. We validate the effectiveness by testing multi-step instructions and comparing the text alignment and consistency with several baselines. Our experiments show that our approach can visualize coherent and visually pleasing instructions

* https://instruct-vis-zero.github.io/

Via

Access Paper or Ask Questions

VividDream: Generating 3D Scene with Ambient Dynamics

May 30, 2024

Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

Abstract:We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.

* Project page: https://vivid-dream-4d.github.io

Via

Access Paper or Ask Questions

On the Content Bias in Fréchet Video Distance

Apr 18, 2024

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang

Figure 1 for On the Content Bias in Fréchet Video Distance

Figure 2 for On the Content Bias in Fréchet Video Distance

Figure 3 for On the Content Bias in Fréchet Video Distance

Figure 4 for On the Content Bias in Fréchet Video Distance

Abstract:Fr\'echet Video Distance (FVD), a prominent metric for evaluating video generation models, is known to conflict with human perception occasionally. In this paper, we aim to explore the extent of FVD's bias toward per-frame quality over temporal realism and identify its sources. We first quantify the FVD's sensitivity to the temporal axis by decoupling the frame and motion quality and find that the FVD increases only slightly with large temporal corruption. We then analyze the generated videos and show that via careful sampling from a large set of generated videos that do not contain motions, one can drastically decrease FVD without improving the temporal quality. Both studies suggest FVD's bias towards the quality of individual frames. We further observe that the bias can be attributed to the features extracted from a supervised video classifier trained on the content-biased dataset. We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality. Finally, we revisit a few real-world examples to validate our hypothesis.

* CVPR 2024. Project webpage: https://content-debiased-fvd.github.io/

Via

Access Paper or Ask Questions