Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Angela Dai

MeshPad: Interactive Sketch Conditioned Artistic-designed Mesh Generation and Editing

Mar 03, 2025

Haoxuan Li, Ziya Erkoc, Lei Li, Daniele Sirigatti, Vladyslav Rozov, Angela Dai, Matthias Nießner

Abstract:We introduce MeshPad, a generative approach that creates 3D meshes from sketch inputs. Building on recent advances in artistic-designed triangle mesh generation, our approach addresses the need for interactive artistic mesh creation. To this end, we focus on enabling consistent edits by decomposing editing into 'deletion' of regions of a mesh, followed by 'addition' of new mesh geometry. Both operations are invoked by simple user edits of a sketch image, facilitating an iterative content creation process and enabling the construction of complex 3D meshes. Our approach is based on a triangle sequence-based mesh representation, exploiting a large Transformer model for mesh triangle addition and deletion. In order to perform edits interactively, we introduce a vertex-aligned speculative prediction strategy on top of our additive mesh generator. This speculator predicts multiple output tokens corresponding to a vertex, thus significantly reducing the computational cost of inference and accelerating the editing process, making it possible to execute each editing step in only a few seconds. Comprehensive experiments demonstrate that MeshPad outperforms state-of-the-art sketch-conditioned mesh generation methods, achieving more than 22% mesh quality improvement in Chamfer distance, and being preferred by 90% of participants in perceptual evaluations.

* Project page: https://derkleineli.github.io/meshpad/ Video: https://youtu.be/ql37mWf4pg8

Via

Access Paper or Ask Questions

Use of Winsome Robots for Understanding Human Feedback (UWU)

Feb 07, 2025

Jessica Eggers, Angela Dai, Matthew C. Gombolay

Abstract:As social robots become more common, many have adopted cute aesthetics aiming to enhance user comfort and acceptance. However, the effect of this aesthetic choice on human feedback in reinforcement learning scenarios remains unclear. Previous research has shown that humans tend to give more positive than negative feedback, which can cause failure to reach optimal robot behavior. We hypothesize that this positive bias may be exacerbated by the robot's level of perceived cuteness. To investigate, we conducted a user study where participants critique a robot's trajectories while it performs a task. We then analyzed the impact of the robot's aesthetic cuteness on the type of participant feedback. Our results suggest that there is a shift in the ratio of positive to negative feedback when perceived cuteness changes. In light of this, we experiment with a stochastic version of TAMER which adapts based on the user's level of positive feedback bias to mitigate these effects.

* placeholder

Via

Access Paper or Ask Questions

MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Dec 16, 2024

Daoyi Gao, Yawar Siddiqui, Lei Li, Angela Dai

Figure 1 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Figure 2 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Figure 3 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Figure 4 for MeshArt: Generating Articulated Meshes with Structure-guided Transformers

Abstract:Articulated 3D object generation is fundamental for creating realistic, functional, and interactable virtual assets which are not simply static. We introduce MeshArt, a hierarchical transformer-based approach to generate articulated 3D meshes with clean, compact geometry, reminiscent of human-crafted 3D models. We approach articulated mesh generation in a part-by-part fashion across two stages. First, we generate a high-level articulation-aware object structure; then, based on this structural information, we synthesize each part's mesh faces. Key to our approach is modeling both articulation structures and part meshes as sequences of quantized triangle embeddings, leading to a unified hierarchical framework with transformers for autoregressive generation. Object part structures are first generated as their bounding primitives and articulation modes; a second transformer, guided by these articulation structures, then generates each part's mesh triangles. To ensure coherency among generated parts, we introduce structure-guided conditioning that also incorporates local part mesh connectivity. MeshArt shows significant improvements over state of the art, with 57.1% improvement in structure coverage and a 209-point improvement in mesh generation FID.

* Project Page: https://daoyig.github.io/Mesh_Art/

Via

Access Paper or Ask Questions

Coherent 3D Scene Diffusion From a Single RGB Image

Dec 13, 2024

Manuel Dahnert, Angela Dai, Norman Müller, Matthias Nießner

Figure 1 for Coherent 3D Scene Diffusion From a Single RGB Image

Figure 2 for Coherent 3D Scene Diffusion From a Single RGB Image

Figure 3 for Coherent 3D Scene Diffusion From a Single RGB Image

Figure 4 for Coherent 3D Scene Diffusion From a Single RGB Image

Abstract:We present a novel diffusion-based approach for coherent 3D scene reconstruction from a single RGB image. Our method utilizes an image-conditioned 3D scene diffusion model to simultaneously denoise the 3D poses and geometries of all objects within the scene. Motivated by the ill-posed nature of the task and to obtain consistent scene reconstruction results, we learn a generative scene prior by conditioning on all scene objects simultaneously to capture the scene context and by allowing the model to learn inter-object relationships throughout the diffusion process. We further propose an efficient surface alignment loss to facilitate training even in the absence of full ground-truth annotation, which is common in publicly available datasets. This loss leverages an expressive shape representation, which enables direct point sampling from intermediate shape predictions. By framing the task of single RGB image 3D scene reconstruction as a conditional diffusion process, our approach surpasses current state-of-the-art methods, achieving a 12.04% improvement in AP3D on SUN RGB-D and a 13.43% increase in F-Score on Pix3D.

* Project Page: https://www.manuel-dahnert.com/research/scene-diffusion - Accepted at NeurIPS 2024

Via

Access Paper or Ask Questions

PrEditor3D: Fast and Precise 3D Shape Editing

Dec 09, 2024

Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang

Abstract:We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered. To this end, we first project the 3D object onto 4-view images and perform synchronized multi-view image editing along with user-guided text prompts and user-provided rough masks. However, the targeted regions to be edited are ambiguous due to projection from 3D to 2D. To ensure precise editing only in intended regions, we develop a 3D segmentation pipeline that detects edited areas in 3D space, followed by a merging algorithm to seamlessly integrate edited 3D regions with the original input. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling fast, high-quality editing while preserving unintended regions.

* Project Page: https://ziyaerkoc.com/preditor3d/ Video: https://www.youtube.com/watch?v=Ty2xXaEuewI

Via

Access Paper or Ask Questions

DNF: Unconditional 4D Generation with Dictionary-based Neural Fields

Dec 06, 2024

Xinyi Zhang, Naiqi Li, Angela Dai

Abstract:While remarkable success has been achieved through diffusion-based 3D generative models for shapes, 4D generative modeling remains challenging due to the complexity of object deformations over time. We propose DNF, a new 4D representation for unconditional generative modeling that efficiently models deformable shapes with disentangled shape and motion while capturing high-fidelity details in the deforming objects. To achieve this, we propose a dictionary learning approach to disentangle 4D motion from shape as neural fields. Both shape and motion are represented as learned latent spaces, where each deformable shape is represented by its shape and motion global latent codes, shape-specific coefficient vectors, and shared dictionary information. This captures both shape-specific detail and global shared information in the learned dictionary. Our dictionary-based representation well balances fidelity, contiguity and compression -- combined with a transformer-based diffusion model, our method is able to generate effective, high-fidelity 4D animations.

* Project page: https://xzhang-t.github.io/project/DNF/

Via

Access Paper or Ask Questions

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Dec 03, 2024

Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai

Figure 1 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Figure 2 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Figure 3 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Figure 4 for SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Abstract:We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.

* 21 pages, 12 figures; https://alexeybokhovkin.github.io/scenefactor/

Via

Access Paper or Ask Questions

GaussianSpeech: Audio-Driven Gaussian Avatars

Nov 27, 2024

Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner

Figure 1 for GaussianSpeech: Audio-Driven Gaussian Avatars

Figure 2 for GaussianSpeech: Audio-Driven Gaussian Avatars

Figure 3 for GaussianSpeech: Audio-Driven Gaussian Avatars

Figure 4 for GaussianSpeech: Audio-Driven Gaussian Avatars

Abstract:We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of talking humans in correspondence with audio, we captured a new large-scale multi-view dataset of audio-visual sequences of talking humans with native English accents and diverse facial geometry. GaussianSpeech consistently achieves state-of-the-art performance with visually natural motion at real time rendering rates, while encompassing diverse facial expressions and styles.

* Paper Video: https://youtu.be/2VqYoFlYcwQ Project Page: https://shivangi-aneja.github.io/projects/gaussianspeech

Via

Access Paper or Ask Questions

L3DG: Latent 3D Gaussian Diffusion

Oct 17, 2024

Barbara Roessle, Norman Müller, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Angela Dai, Matthias Nießner

Figure 1 for L3DG: Latent 3D Gaussian Diffusion

Figure 2 for L3DG: Latent 3D Gaussian Diffusion

Figure 3 for L3DG: Latent 3D Gaussian Diffusion

Figure 4 for L3DG: Latent 3D Gaussian Diffusion

Abstract:We propose L3DG, the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. This enables effective generative 3D modeling, scaling to generation of entire room-scale scenes which can be very efficiently rendered. To enable effective synthesis of 3D Gaussians, we propose a latent diffusion formulation, operating in a compressed latent space of 3D Gaussians. This compressed latent space is learned by a vector-quantized variational autoencoder (VQ-VAE), for which we employ a sparse convolutional architecture to efficiently operate on room-scale scenes. This way, the complexity of the costly generation process via diffusion is substantially reduced, allowing higher detail on object-level generation, as well as scalability to large scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time. We demonstrate that our approach significantly improves visual quality over prior work on unconditional object-level radiance field synthesis and showcase its applicability to room-scale scene generation.

* SIGGRAPH Asia 2024, project page: https://barbararoessle.github.io/l3dg , video: https://youtu.be/UHEEiXCYeLU

Via

Access Paper or Ask Questions

LT3SD: Latent Trees for 3D Scene Diffusion

Sep 12, 2024

Quan Meng, Lei Li, Matthias Nießner, Angela Dai

Figure 1 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 2 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 3 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 4 for LT3SD: Latent Trees for 3D Scene Diffusion

Abstract:We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation. Recent advances in diffusion models have shown impressive results in 3D object generation, but are limited in spatial extent and quality when extended to 3D scenes. To generate complex and diverse 3D scene structures, we introduce a latent tree representation to effectively encode both lower-frequency geometry and higher-frequency detail in a coarse-to-fine hierarchy. We can then learn a generative diffusion process in this latent 3D scene space, modeling the latent components of a scene at each resolution level. To synthesize large-scale scenes with varying sizes, we train our diffusion model on scene patches and synthesize arbitrary-sized output 3D scenes through shared diffusion generation across multiple scene patches. Through extensive experiments, we demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation and for probabilistic completion for partial scene observations.

* Project page: https://quan-meng.github.io/projects/lt3sd/ Video: https://youtu.be/AJ5sG9VyjGA

Via

Access Paper or Ask Questions