Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hong-Xing Yu

Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians

Mar 14, 2024

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li

Abstract:Reconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, provide modeling for 3D appearance and geometry but lack the ability to simulate physical properties or optimize parameters for heterogeneous objects. We propose Spring-Gaus, a novel framework that integrates 3D Gaussians with physics-based simulation for reconstructing and simulating elastic objects from multi-view videos. Our method utilizes a 3D Spring-Mass model, enabling the optimization of physical parameters at the individual point level while decoupling the learning of physics and appearance. This approach achieves great sample efficiency, enhances generalization, and reduces sensitivity to the distribution of simulation particles. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. This includes future prediction and simulation under varying initial states and environmental parameters. Project page: https://zlicheng.com/spring_gaus.

Via

Access Paper or Ask Questions

Unsupervised Discovery of Object-Centric Neural Fields

Feb 12, 2024

Rundong Luo, Hong-Xing Yu, Jiajun Wu

Figure 1 for Unsupervised Discovery of Object-Centric Neural Fields

Figure 2 for Unsupervised Discovery of Object-Centric Neural Fields

Figure 3 for Unsupervised Discovery of Object-Centric Neural Fields

Figure 4 for Unsupervised Discovery of Object-Centric Neural Fields

Abstract:We study inferring 3D object-centric scene representations from a single image. While recent methods have shown potential in unsupervised 3D object discovery from simple synthetic images, they fail to generalize to real-world scenes with visually rich and diverse objects. This limitation stems from their object representations, which entangle objects' intrinsic attributes like shape and appearance with extrinsic, viewer-centric properties such as their 3D location. To address this bottleneck, we propose Unsupervised discovery of Object-Centric neural Fields (uOCF). uOCF focuses on learning the intrinsics of objects and models the extrinsics separately. Our approach significantly improves systematic generalization, thus enabling unsupervised learning of high-fidelity object-centric scene representations from sparse real-world images. To evaluate our approach, we collect three new datasets, including two real kitchen environments. Extensive experiments show that uOCF enables unsupervised discovery of visually rich objects from a single real image, allowing applications such as 3D object segmentation and scene manipulation. Notably, uOCF demonstrates zero-shot generalization to unseen objects from a single real image. Project page: https://red-fairy.github.io/uOCF/

Via

Access Paper or Ask Questions

Fluid Simulation on Neural Flow Maps

Dec 22, 2023

Yitong Deng, Hong-Xing Yu, Diyang Zhang, Jiajun Wu, Bo Zhu

Abstract:We introduce Neural Flow Maps, a novel simulation method bridging the emerging paradigm of implicit neural representations with fluid simulation based on the theory of flow maps, to achieve state-of-the-art simulation of inviscid fluid phenomena. We devise a novel hybrid neural field representation, Spatially Sparse Neural Fields (SSNF), which fuses small neural networks with a pyramid of overlapping, multi-resolution, and spatially sparse grids, to compactly represent long-term spatiotemporal velocity fields at high accuracy. With this neural velocity buffer in hand, we compute long-term, bidirectional flow maps and their Jacobians in a mechanistically symmetric manner, to facilitate drastic accuracy improvement over existing solutions. These long-range, bidirectional flow maps enable high advection accuracy with low dissipation, which in turn facilitates high-fidelity incompressible flow simulations that manifest intricate vortical structures. We demonstrate the efficacy of our neural fluid simulation in a variety of challenging simulation scenarios, including leapfrogging vortices, colliding vortices, vortex reconnections, as well as vortex generation from moving obstacles and density differences. Our examples show increased performance over existing methods in terms of energy conservation, visual complexity, adherence to experimental observations, and preservation of detailed vortical structures.

* ACM Trans. Graph. 42, 6, Article 248 (December 2023), 21 pages

Via

Access Paper or Ask Questions

Inferring Hybrid Neural Fluid Fields from Videos

Dec 11, 2023

Hong-Xing Yu, Yang Zheng, Yuan Gao, Yitong Deng, Bo Zhu, Jiajun Wu

Abstract:We study recovering fluid density and velocity from sparse multiview videos. Existing neural dynamic reconstruction methods predominantly rely on optical flows; therefore, they cannot accurately estimate the density and uncover the underlying velocity due to the inherent visual ambiguities of fluid velocity, as fluids are often shapeless and lack stable visual features. The challenge is further pronounced by the turbulent nature of fluid flows, which calls for properly designed fluid velocity representations. To address these challenges, we propose hybrid neural fluid fields (HyFluid), a neural approach to jointly infer fluid density and velocity fields. Specifically, to deal with visual ambiguities of fluid velocity, we introduce a set of physics-based losses that enforce inferring a physically plausible velocity field, which is divergence-free and drives the transport of density. To deal with the turbulent nature of fluid velocity, we design a hybrid neural velocity representation that includes a base neural velocity field that captures most irrotational energy and a vortex particle-based velocity that models residual turbulent velocity. We show that our method enables recovering vortical flow details. Our approach opens up possibilities for various learning and reconstruction applications centered around 3D incompressible flow, including fluid re-simulation and editing, future prediction, and neural dynamic scene composition. Project website: https://kovenyu.com/HyFluid/

* NeurIPS 2023. Project website: https://kovenyu.com/HyFluid/ The first two authors contribute equally

Via

Access Paper or Ask Questions

3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Dec 08, 2023

Yunhao Ge, Hong-Xing Yu, Cheng Zhao, Yuliang Guo, Xinyu Huang, Liu Ren, Laurent Itti, Jiajun Wu

Figure 1 for 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Figure 2 for 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Figure 3 for 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Figure 4 for 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Abstract:A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection. Project website: https://gyhandy.github.io/3D-Copy-Paste/

* NeurIPS 2023. Project website: https://gyhandy.github.io/3D-Copy-Paste/

Via

Access Paper or Ask Questions

WonderJourney: Going from Anywhere to Everywhere

Dec 06, 2023

Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu(+1 more)

Figure 1 for WonderJourney: Going from Anywhere to Everywhere

Figure 2 for WonderJourney: Going from Anywhere to Everywhere

Figure 3 for WonderJourney: Going from Anywhere to Everywhere

Figure 4 for WonderJourney: Going from Anywhere to Everywhere

Abstract:We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/

* Project website with video results: https://kovenyu.com/WonderJourney/

Via

Access Paper or Ask Questions

Are These the Same Apple? Comparing Images Based on Object Intrinsics

Nov 01, 2023

Klemen Kotar, Stephen Tian, Hong-Xing Yu, Daniel L. K. Yamins, Jiajun Wu

Figure 1 for Are These the Same Apple? Comparing Images Based on Object Intrinsics

Figure 2 for Are These the Same Apple? Comparing Images Based on Object Intrinsics

Figure 3 for Are These the Same Apple? Comparing Images Based on Object Intrinsics

Figure 4 for Are These the Same Apple? Comparing Images Based on Object Intrinsics

Abstract:The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This problem has been studied in the computer vision literature as re-identification, though mostly restricted to specific object categories such as people and cars. We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics. To benchmark such measurements, we collect the Common paired objects Under differenT Extrinsics (CUTE) dataset of $18,000$ images of $180$ objects under different extrinsic factors such as lighting, poses, and imaging conditions. While existing methods such as LPIPS and CLIP scores do not measure object intrinsics well, we find that combining deep features learned from contrastive self-supervised learning with foreground filtering is a simple yet effective approach to approximating the similarity. We conduct an extensive survey of pre-trained features and foreground extraction methods to arrive at a strong baseline that best measures intrinsic object-centric image similarity among current methods. Finally, we demonstrate that our approach can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification. Please see our project website at https://s-tian.github.io/projects/cute/ for visualizations of the data and demos of our metric.

* First two authors contributed equally. Accepted at NeurIPS Datasets and Benchmarks Track 2023

Via

Access Paper or Ask Questions

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Oct 27, 2023

Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun(+1 more)

Figure 1 for ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Figure 2 for ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Figure 3 for ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Figure 4 for ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Abstract:We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/

* 17 pages

Via

Access Paper or Ask Questions

Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

Oct 25, 2023

Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu

Figure 1 for Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

Figure 2 for Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

Figure 3 for Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

Figure 4 for Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

Abstract:We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark. Recent advances in inverse rendering have enabled a wide range of real-world applications in 3D content generation, moving rapidly from research and commercial use cases to consumer devices. While the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the performance of various inverse rendering methods. Existing real-world datasets typically only consist of the shape and multi-view images of objects, which are not sufficient for evaluating the quality of material recovery and object relighting. Methods capable of recovering material and lighting often resort to synthetic data for quantitative evaluation, which on the other hand does not guarantee generalization to complex real-world environments. We introduce a new dataset of real-world objects captured under a variety of natural scenes with ground-truth 3D scans, multi-view images, and environment lighting. Using this dataset, we establish the first comprehensive real-world evaluation benchmark for object inverse rendering tasks from in-the-wild scenes, and compare the performance of various existing methods.

* NeurIPS 2023 Datasets and Benchmarks Track. The first two authors contributed equally to this work. Project page: https://stanfordorb.github.io/

Via

Access Paper or Ask Questions

Tree-Structured Shading Decomposition

Sep 13, 2023

Chen Geng, Hong-Xing Yu, Sharon Zhang, Maneesh Agrawala, Jiajun Wu

Figure 1 for Tree-Structured Shading Decomposition

Figure 2 for Tree-Structured Shading Decomposition

Figure 3 for Tree-Structured Shading Decomposition

Figure 4 for Tree-Structured Shading Decomposition

Abstract:We study inferring a tree-structured representation from a single image for object shading. Prior work typically uses the parametric or measured representation to model shading, which is neither interpretable nor easily editable. We propose using the shade tree representation, which combines basic shading nodes and compositing methods to factorize object surface shading. The shade tree representation enables novice users who are unfamiliar with the physical shading process to edit object shading in an efficient and intuitive manner. A main challenge in inferring the shade tree is that the inference problem involves both the discrete tree structure and the continuous parameters of the tree nodes. We propose a hybrid approach to address this issue. We introduce an auto-regressive inference model to generate a rough estimation of the tree structure and node parameters, and then we fine-tune the inferred shade tree through an optimization algorithm. We show experiments on synthetic images, captured reflectance, real images, and non-realistic vector drawings, allowing downstream applications such as material editing, vectorized shading, and relighting. Project website: https://chen-geng.com/inv-shade-trees

* Accepted at ICCV 2023. Project website: https://chen-geng.com/inv-shade-trees

Via

Access Paper or Ask Questions