Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shangzhan Zhang

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Apr 26, 2024

Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

Figure 1 for MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Figure 2 for MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Figure 3 for MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Figure 4 for MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Abstract:This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and corresponding text descriptions, to train a material graph generative model, we propose to leverage the pre-trained 2D diffusion model as a bridge to connect the text and material graphs. Specifically, our approach decomposes a shape into a set of segments and designs a segment-controlled diffusion model to synthesize 2D images that are aligned with mesh parts. Based on generated images, we initialize parameters of material graphs and fine-tune them through the differentiable rendering module to produce materials in accordance with the textual description. Extensive experiments demonstrate the superior performance of our framework in photorealism, resolution, and editability over existing methods. Project page: https://zhanghe3z.github.io/MaPa/

* SIGGRAPH 2024. Project page: https://zhanghe3z.github.io/MaPa/

Via

Access Paper or Ask Questions

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

Apr 08, 2024

Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yue Wang, Yujun Shen, Yiyi Liao

Abstract:Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice. This work targets learning 3D-aware GANs from unposed images, for which we propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF). Concretely, in addition to a generative radiance field as in previous approaches, we ask the generator to also learn a field from 2D semantic features while sharing the density from the radiance field. Such a framework allows us to acquire a canonical 3D feature template leveraging the dataset mean discovered by the generative model, and further efficiently estimate the pose parameters on real data. Experimental results on various challenging datasets demonstrate the superiority of our approach over state-of-the-art alternatives from both the qualitative and the quantitative perspectives.

* https://XDimlab.github.io/TeFF

Via

Access Paper or Ask Questions

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Apr 05, 2024

Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou

Figure 1 for SpatialTracker: Tracking Any 2D Pixels in 3D Space

Figure 2 for SpatialTracker: Tracking Any 2D Pixels in 3D Space

Figure 3 for SpatialTracker: Tracking Any 2D Pixels in 3D Space

Figure 4 for SpatialTracker: Tracking Any 2D Pixels in 3D Space

Abstract:Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain. While 2D motion can be intricate, we posit that the underlying 3D motion can often be simple and low-dimensional. In this work, we propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection. Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators, represents the 3D content of each frame efficiently using a triplane representation, and performs iterative updates using a transformer to estimate 3D trajectories. Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts. Extensive evaluation shows that our approach achieves state-of-the-art tracking performance both qualitatively and quantitatively, particularly in challenging scenarios such as out-of-plane rotation.

* Accepted to CVPR 2024 (selected as highlight paper). Project page: https://henry123-boy.github.io/SpaTracker/

Via

Access Paper or Ask Questions

PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Sep 19, 2023

Xiao Fu, Shangzhan Zhang, Tianrun Chen, Yichong Lu, Xiaowei Zhou, Andreas Geiger, Yiyi Liao

Figure 1 for PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Figure 2 for PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Figure 3 for PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Figure 4 for PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Abstract:Training perception systems for self-driving cars requires substantial annotations. However, manual labeling in 2D images is highly labor-intensive. While existing datasets provide rich annotations for pre-recorded sequences, they fall short in labeling rarely encountered viewpoints, potentially hampering the generalization ability for perception models. In this paper, we present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate consistent panoptic labels and high-quality images from any viewpoint. Our key insight lies in exploiting the complementarity of 3D and 2D priors to mutually enhance geometry and semantics. Specifically, we propose to leverage noisy semantic and instance labels in both 3D and 2D spaces to guide geometry optimization. Simultaneously, the improved geometry assists in filtering noise present in the 3D and 2D annotations by merging them in 3D space via a learned semantic field. To further enhance appearance, we combine MLP and hash grids to yield hybrid scene features, striking a balance between high-frequency appearance and predominantly contiguous semantics. Our experiments demonstrate PanopticNeRF-360's state-of-the-art performance over existing label transfer methods on the challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360 enables omnidirectional rendering of high-fidelity, multi-view and spatiotemporally consistent appearance, semantic and instance labels. We make our code and data available at https://github.com/fuxiao0719/PanopticNeRF

* Project page: http://fuxiao0719.github.io/projects/panopticnerf360/. arXiv admin note: text overlap with arXiv:2203.15224

Via

Access Paper or Ask Questions

Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Jul 24, 2023

Shangzhan Zhang, Sida Peng, Yinji ShenTu, Qing Shuai, Tianrun Chen, Kaicheng Yu, Hujun Bao, Xiaowei Zhou

Figure 1 for Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Figure 2 for Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Figure 3 for Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Figure 4 for Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Abstract:Recently, the editing of neural radiance fields (NeRFs) has gained considerable attention, but most prior works focus on static scenes while research on the appearance editing of dynamic scenes is relatively lacking. In this paper, we propose a novel framework to edit the local appearance of dynamic NeRFs by manipulating pixels in a single frame of training video. Specifically, to locally edit the appearance of dynamic NeRFs while preserving unedited regions, we introduce a local surface representation of the edited region, which can be inserted into and rendered along with the original NeRF and warped to arbitrary other frames through a learned invertible motion representation network. By employing our method, users without professional expertise can easily add desired content to the appearance of a dynamic scene. We extensively evaluate our approach on various scenes and show that our approach achieves spatially and temporally consistent editing results. Notably, our approach is versatile and applicable to different variants of dynamic NeRF representations.

* project page: https://dyn-e.github.io/

Via

Access Paper or Ask Questions

Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Feb 14, 2023

Shangzhan Zhang, Sida Peng, Tianrun Chen, Linzhan Mou, Haotong Lin, Kaicheng Yu, Yiyi Liao, Xiaowei Zhou

Figure 1 for Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Figure 2 for Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Figure 3 for Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Figure 4 for Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

Abstract:We introduce a novel approach that takes a single semantic mask as input to synthesize multi-view consistent color images of natural scenes, trained with a collection of single images from the Internet. Prior works on 3D-aware image synthesis either require multi-view supervision or learning category-level prior for specific classes of objects, which can hardly work for natural scenes. Our key idea to solve this challenging problem is to use a semantic field as the intermediate representation, which is easier to reconstruct from an input semantic mask and then translate to a radiance field with the assistance of off-the-shelf semantic image synthesis models. Experiments show that our method outperforms baseline methods and produces photorealistic, multi-view consistent videos of a variety of natural scenes.

* Project website: https://zju3dv.github.io/paintingnature/

Via

Access Paper or Ask Questions

Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Mar 29, 2022

Xiao Fu, Shangzhan Zhang, Tianrun Chen, Yichong Lu, Lanyun Zhu, Xiaowei Zhou, Andreas Geiger, Yiyi Liao

Figure 1 for Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Figure 2 for Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Figure 3 for Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Figure 4 for Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Abstract:Large-scale training data with high-quality annotations is critical for training semantic and instance segmentation models. Unfortunately, pixel-wise annotation is labor-intensive and costly, raising the demand for more efficient labeling strategies. In this work, we present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels from easy-to-obtain coarse 3D bounding primitives. Our method utilizes NeRF as a differentiable tool to unify coarse 3D annotations and 2D semantic cues transferred from existing datasets. We demonstrate that this combination allows for improved geometry guided by semantic information, enabling rendering of accurate semantic maps across multiple views. Furthermore, this fusion process resolves label ambiguity of the coarse 3D annotations and filters noise in the 2D predictions. By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design. Experimental results show that Panoptic NeRF outperforms existing semantic and instance label transfer methods in terms of accuracy and multi-view consistency on challenging urban scenes of the KITTI-360 dataset.

* Project page: https://fuxiao0719.github.io/projects/panopticnerf/

Via

Access Paper or Ask Questions

Animatable Neural Implicit Surfaces for Creating Avatars from Videos

Mar 15, 2022

Sida Peng, Shangzhan Zhang, Zhen Xu, Chen Geng, Boyi Jiang, Hujun Bao, Xiaowei Zhou

Figure 1 for Animatable Neural Implicit Surfaces for Creating Avatars from Videos

Figure 2 for Animatable Neural Implicit Surfaces for Creating Avatars from Videos

Figure 3 for Animatable Neural Implicit Surfaces for Creating Avatars from Videos

Figure 4 for Animatable Neural Implicit Surfaces for Creating Avatars from Videos

Abstract:This paper aims to reconstruct an animatable human model from a video of very sparse camera views. Some recent works represent human geometry and appearance with neural radiance fields and utilize parametric human models to produce deformation fields for animation, which enables them to recover detailed 3D human models from videos. However, their reconstruction results tend to be noisy due to the lack of surface constraints on radiance fields. Moreover, as they generate the human appearance in 3D space, their rendering quality heavily depends on the accuracy of deformation fields. To solve these problems, we propose Animatable Neural Implicit Surface (AniSDF), which models the human geometry with a signed distance field and defers the appearance generation to the 2D image space with a 2D neural renderer. The signed distance field naturally regularizes the learned geometry, enabling the high-quality reconstruction of human bodies, which can be further used to improve the rendering speed. Moreover, the 2D neural renderer can be learned to compensate for geometric errors, making the rendering more robust to inaccurate deformations. Experiments on several datasets show that the proposed approach outperforms recent human reconstruction and synthesis methods by a large margin.

* Project page: https://zju3dv.github.io/animatable_sdf/

Via

Access Paper or Ask Questions

Animatable Neural Radiance Fields for Human Body Modeling

May 06, 2021

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, Xiaowei Zhou

Figure 1 for Animatable Neural Radiance Fields for Human Body Modeling

Figure 2 for Animatable Neural Radiance Fields for Human Body Modeling

Figure 3 for Animatable Neural Radiance Fields for Human Body Modeling

Figure 4 for Animatable Neural Radiance Fields for Human Body Modeling

Abstract:This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a dynamic scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce neural blend weight fields to produce the deformation fields. Based on the skeleton-driven deformation, blend weight fields are used with 3D human skeletons to generate observation-to-canonical and canonical-to-observation correspondences. Since 3D human skeletons are more observable, they can regularize the learning of deformation fields. Moreover, the learned blend weight fields can be combined with input skeletal motions to generate new deformation fields to animate the human model. Experiments show that our approach significantly outperforms recent human synthesis methods. The code will be available at https://zju3dv.github.io/animatable_nerf/.

* The first two authors contributed equally to this paper. Project page: https://zju3dv.github.io/animatable_nerf/

Via

Access Paper or Ask Questions