Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Quan Meng

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Dec 03, 2024

Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai

Figure 1 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Figure 2 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Figure 3 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Figure 4 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Abstract:We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.

* 21 pages, 12 figures; https://alexeybokhovkin.github.io/scenefactor/

Via

Access Paper or Ask Questions

LT3SD: Latent Trees for 3D Scene Diffusion

Sep 12, 2024

Quan Meng, Lei Li, Matthias Nießner, Angela Dai

Figure 1 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 2 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 3 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 4 for LT3SD: Latent Trees for 3D Scene Diffusion

Abstract:We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation. Recent advances in diffusion models have shown impressive results in 3D object generation, but are limited in spatial extent and quality when extended to 3D scenes. To generate complex and diverse 3D scene structures, we introduce a latent tree representation to effectively encode both lower-frequency geometry and higher-frequency detail in a coarse-to-fine hierarchy. We can then learn a generative diffusion process in this latent 3D scene space, modeling the latent components of a scene at each resolution level. To synthesize large-scale scenes with varying sizes, we train our diffusion model on scene patches and synthesize arbitrary-sized output 3D scenes through shared diffusion generation across multiple scene patches. Through extensive experiments, we demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation and for probabilistic completion for partial scene observations.

* Project page: https://quan-meng.github.io/projects/lt3sd/ Video: https://youtu.be/AJ5sG9VyjGA

Via

Access Paper or Ask Questions

GNeRF: GAN-based Neural Radiance Field without Posed Camera

Mar 30, 2021

Quan Meng, Anpei Chen, Haimin Luo, Minye Wu, Hao Su, Lan Xu, Xuming He, Jingyi Yu

Figure 1 for GNeRF: GAN-based Neural Radiance Field without Posed Camera

Figure 2 for GNeRF: GAN-based Neural Radiance Field without Posed Camera

Figure 3 for GNeRF: GAN-based Neural Radiance Field without Posed Camera

Figure 4 for GNeRF: GAN-based Neural Radiance Field without Posed Camera

Abstract:We introduce GNeRF, a framework to marry Generative Adversarial Networks (GAN) with Neural Radiance Field reconstruction for the complex scenarios with unknown and even randomly initialized camera poses. Recent NeRF-based advances have gained popularity for remarkable realistic novel view synthesis. However, most of them heavily rely on accurate camera poses estimation, while few recent methods can only optimize the unknown camera poses in roughly forward-facing scenes with relatively short camera trajectories and require rough camera poses initialization. Differently, our GNeRF only utilizes randomly initialized poses for complex outside-in scenarios. We propose a novel two-phases end-to-end framework. The first phase takes the use of GANs into the new realm for coarse camera poses and radiance fields jointly optimization, while the second phase refines them with additional photometric loss. We overcome local minima using a hybrid and iterative optimization scheme. Extensive experiments on a variety of synthetic and natural scenes demonstrate the effectiveness of GNeRF. More impressively, our approach outperforms the baselines favorably in those scenes with repeated patterns or even low textures that are regarded as extremely challenging before.

Via

Access Paper or Ask Questions

LGNN: A Context-aware Line Segment Detector

Aug 29, 2020

Quan Meng, Jiakai Zhang, Qiang Hu, Xuming He, Jingyi Yu

Figure 1 for LGNN: A Context-aware Line Segment Detector

Figure 2 for LGNN: A Context-aware Line Segment Detector

Figure 3 for LGNN: A Context-aware Line Segment Detector

Figure 4 for LGNN: A Context-aware Line Segment Detector

Abstract:We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN). Existing approaches require a computationally expensive verification or postprocessing step. Our LGNN employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network (GNN) module for reasoning their connectivities. Specifically, LGNN exploits a new quadruplet representation for each line segment where the GNN module takes the predicted candidates as vertexes and constructs a sparse graph to enforce structural context. Compared with the state-of-the-art, LGNN achieves near real-time performance without compromising accuracy. LGNN further enables time-sensitive 3D applications. When a 3D point cloud is accessible, we present a multi-modal line segment classification technique for extracting a 3D wireframe of the environment robustly and efficiently.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions