Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zexiang Xu

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Apr 30, 2024

Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu

Figure 1 for GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Figure 2 for GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Figure 3 for GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Figure 4 for GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Abstract:We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model in downstream 3D generation tasks. Our project webpage is available at: https://sai-bi.github.io/project/gs-lrm/ .

* Project webpage: https://sai-bi.github.io/project/gs-lrm/

Via

Access Paper or Ask Questions

DMesh: A Differentiable Representation for General Meshes

Apr 20, 2024

Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou

Abstract:We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and formulate probability of faces to exist on our desired mesh in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point cloud and multi-view images using gradient-based optimization. The source code and full paper is available at: https://sonsang.github.io/dmesh-project.

* 17 pages, 9 figures

Via

Access Paper or Ask Questions

MeshLRM: Large Reconstruction Model for High-Quality Mesh

Apr 18, 2024

Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu

Figure 1 for MeshLRM: Large Reconstruction Model for High-Quality Mesh

Figure 2 for MeshLRM: Large Reconstruction Model for High-Quality Mesh

Figure 3 for MeshLRM: Large Reconstruction Model for High-Quality Mesh

Figure 4 for MeshLRM: Large Reconstruction Model for High-Quality Mesh

Abstract:We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a pre-trained NeRF LRM with mesh rendering. Moreover, we improve the LRM architecture by simplifying several complex designs in previous LRMs. MeshLRM's NeRF initialization is sequentially trained with low- and high-resolution images; this new LRM training strategy enables significantly faster convergence and thereby leads to better quality with less compute. Our approach achieves state-of-the-art mesh reconstruction from sparse-view inputs and also allows for many downstream applications, including text-to-3D and single-image-to-3D generation. Project page: https://sarahweiii.github.io/meshlrm/

Via

Access Paper or Ask Questions

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Nov 23, 2023

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

Figure 1 for Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Figure 2 for Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Figure 3 for Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Figure 4 for Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Abstract:Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.

* Project webpage: https://jiahao.ai/instant3d/

Via

Access Paper or Ask Questions

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Nov 23, 2023

Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang

Figure 1 for PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Figure 2 for PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Figure 3 for PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Figure 4 for PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Abstract:We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspective-n-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ~1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm .

* Project website: https://totoro97.github.io/pf-lrm ; add more experiments

Via

Access Paper or Ask Questions

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Nov 15, 2023

Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu(+1 more)

Figure 1 for DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Figure 2 for DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Figure 3 for DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Figure 4 for DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Abstract:We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .

* Project Page: https://justimyhxu.github.io/projects/dmv3d/

Via

Access Paper or Ask Questions

Controllable Dynamic Appearance for Neural 3D Portraits

Sep 21, 2023

ShahRukh Athar, Zhixin Shu, Zexiang Xu, Fujun Luan, Sai Bi, Kalyan Sunkavalli, Dimitris Samaras

Abstract:Recent advances in Neural Radiance Fields (NeRFs) have made it possible to reconstruct and reanimate dynamic portrait scenes with control over head-pose, facial expressions and viewing direction. However, training such models assumes photometric consistency over the deformed region e.g. the face must be evenly lit as it deforms with changing head-pose and facial expression. Such photometric consistency across frames of a video is hard to maintain, even in studio environments, thus making the created reanimatable neural portraits prone to artifacts during reanimation. In this work, we propose CoDyNeRF, a system that enables the creation of fully controllable 3D portraits in real-world capture conditions. CoDyNeRF learns to approximate illumination dependent effects via a dynamic appearance model in the canonical space that is conditioned on predicted surface normals and the facial expressions and head-pose deformations. The surface normals prediction is guided using 3DMM normals that act as a coarse prior for the normals of the human head, where direct prediction of normals is hard due to rigid and non-rigid deformations induced by head-pose and facial expression changes. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls, and realistic lighting effects. The project page can be found here: http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html

Via

Access Paper or Ask Questions

OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

Sep 14, 2023

Isabella Liu, Linghao Chen, Ziyang Fu, Liwen Wu, Haian Jin, Zhong Li, Chin Ming Ryan Wong, Yi Xu, Ravi Ramamoorthi, Zexiang Xu(+1 more)

Figure 1 for OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

Figure 2 for OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

Figure 3 for OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

Figure 4 for OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

Abstract:We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. For each image in the dataset, we provide accurate camera parameters, illumination ground truth, and foreground segmentation masks. Our dataset enables the quantitative evaluation of most inverse rendering and material decomposition methods for real objects. We examine several state-of-the-art inverse rendering methods on our dataset and compare their performances. The dataset and code can be found on the project page: https://oppo-us-research.github.io/OpenIllumination.

Via

Access Paper or Ask Questions

Strivec: Sparse Tri-Vector Radiance Fields

Jul 25, 2023

Quankai Gao, Qiangeng Xu, Hao Su, Ulrich Neumann, Zexiang Xu

Figure 1 for Strivec: Sparse Tri-Vector Radiance Fields

Figure 2 for Strivec: Sparse Tri-Vector Radiance Fields

Figure 3 for Strivec: Sparse Tri-Vector Radiance Fields

Figure 4 for Strivec: Sparse Tri-Vector Radiance Fields

Abstract:We propose Strivec, a novel neural representation that models a 3D scene as a radiance field with sparsely distributed and compactly factorized local tensor feature grids. Our approach leverages tensor decomposition, following the recent work TensoRF, to model the tensor grids. In contrast to TensoRF which uses a global tensor and focuses on their vector-matrix decomposition, we propose to utilize a cloud of local tensors and apply the classic CANDECOMP/PARAFAC (CP) decomposition to factorize each tensor into triple vectors that express local feature distributions along spatial axes and compactly encode a local neural field. We also apply multi-scale tensor grids to discover the geometry and appearance commonalities and exploit spatial coherence with the tri-vector factorization at multiple local scales. The final radiance field properties are regressed by aggregating neural features from multiple local tensors across all scales. Our tri-vector tensors are sparsely distributed around the actual scene surface, discovered by a fast coarse reconstruction, leveraging the sparsity of a 3D scene. We demonstrate that our model can achieve better rendering quality while using significantly fewer parameters than previous methods, including TensoRF and Instant-NGP.

Via

Access Paper or Ask Questions

Neural Free-Viewpoint Relighting for Glossy Indirect Illumination

Jul 12, 2023

Nithin Raghavan, Yan Xiao, Kai-En Lin, Tiancheng Sun, Sai Bi, Zexiang Xu, Tzu-Mao Li, Ravi Ramamoorthi

Abstract:Precomputed Radiance Transfer (PRT) remains an attractive solution for real-time rendering of complex light transport effects such as glossy global illumination. After precomputation, we can relight the scene with new environment maps while changing viewpoint in real-time. However, practical PRT methods are usually limited to low-frequency spherical harmonic lighting. All-frequency techniques using wavelets are promising but have so far had little practical impact. The curse of dimensionality and much higher data requirements have typically limited them to relighting with fixed view or only direct lighting with triple product integrals. In this paper, we demonstrate a hybrid neural-wavelet PRT solution to high-frequency indirect illumination, including glossy reflection, for relighting with changing view. Specifically, we seek to represent the light transport function in the Haar wavelet basis. For global illumination, we learn the wavelet transport using a small multi-layer perceptron (MLP) applied to a feature field as a function of spatial location and wavelet index, with reflected direction and material parameters being other MLP inputs. We optimize/learn the feature field (compactly represented by a tensor decomposition) and MLP parameters from multiple images of the scene under different lighting and viewing conditions. We demonstrate real-time (512 x 512 at 24 FPS, 800 x 600 at 13 FPS) precomputed rendering of challenging scenes involving view-dependent reflections and even caustics.

* 13 pages, 9 figures, to appear in cgf proceedings of egsr 2023

Via

Access Paper or Ask Questions