Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Luximon

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Mar 14, 2025

Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon

Abstract:Generating high-quality physically based rendering (PBR) materials is important to achieve realistic rendering in the downstream tasks, yet it remains challenging due to the intertwined effects of materials and lighting. While existing methods have made breakthroughs by incorporating material decomposition in the 3D generation pipeline, they tend to bake highlights into albedo and ignore spatially varying properties of metallicity and roughness. In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model. Specifically, PBR3DGen leverages vision language models (VLM) to guide multi-view diffusion, precisely capturing the spatial distribution and inherent attributes of reflective-metalness material. Additionally, we incorporate view-dependent illumination-aware conditions as pixel-aware priors to enhance spatially varying material properties. Furthermore, our reconstruction model reconstructs high-quality mesh with PBR materials. Experimental results demonstrate that PBR3DGen significantly outperforms existing methods, achieving new state-of-the-art results for PBR estimation and mesh generation. More results and visualization can be found on our project page: https://pbr3dgen1218.github.io/.

* Homepage: https://pbr3dgen1218.github.io/

Via

Access Paper or Ask Questions

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Nov 21, 2024

Zhuoman Liu, Weicai Ye, Yan Luximon, Pengfei Wan, Di Zhang

Figure 1 for Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Figure 2 for Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Figure 3 for Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Figure 4 for Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Abstract:Realistic simulation of dynamic scenes requires accurately capturing diverse material properties and modeling complex object interactions grounded in physical principles. However, existing methods are constrained to basic material types with limited predictable parameters, making them insufficient to represent the complexity of real-world materials. We introduce a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation. Our method utilizes multi-modal models to identify material types and initialize material parameters through image queries, while simultaneously inferring 3D Gaussian splats for detailed scene representation. We further refine these material parameters using video diffusion with a differentiable Material Point Method (MPM) and optical flow guidance rather than render loss or Score Distillation Sampling (SDS) loss. This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios, advancing both accuracy and flexibility in physics-based simulations.

* Homepage: https://zhuomanliu.github.io/PhysFlow/

Via

Access Paper or Ask Questions

GoalGrasp: Grasping Goals in Partially Occluded Scenarios without Grasp Training

May 08, 2024

Shun Gui, Yan Luximon

Abstract:We present GoalGrasp, a simple yet effective 6-DOF robot grasp pose detection method that does not rely on grasp pose annotations and grasp training. Our approach enables user-specified object grasping in partially occluded scenes. By combining 3D bounding boxes and simple human grasp priors, our method introduces a novel paradigm for robot grasp pose detection. First, we employ a 3D object detector named RCV, which requires no 3D annotations, to achieve rapid 3D detection in new scenes. Leveraging the 3D bounding box and human grasp priors, our method achieves dense grasp pose detection. The experimental evaluation involves 18 common objects categorized into 7 classes based on shape. Without grasp training, our method generates dense grasp poses for 1000 scenes. We compare our method's grasp poses to existing approaches using a novel stability metric, demonstrating significantly higher grasp pose stability. In user-specified robot grasping experiments, our approach achieves a 94% grasp success rate. Moreover, in user-specified grasping experiments under partial occlusion, the success rate reaches 92%.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

Feb 25, 2024

Xiaokang Wei, Zhuoman Liu, Yan Luximon

Figure 1 for SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

Figure 2 for SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

Figure 3 for SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

Figure 4 for SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

Abstract:We propose SIR, an efficient method to decompose differentiable shadows for inverse rendering on indoor scenes using multi-view data, addressing the challenges in accurately decomposing the materials and lighting conditions. Unlike previous methods that struggle with shadow fidelity in complex lighting environments, our approach explicitly learns shadows for enhanced realism in material estimation under unknown light positions. Utilizing posed HDR images as input, SIR employs an SDF-based neural radiance field for comprehensive scene representation. Then, SIR integrates a shadow term with a three-stage material estimation approach to improve SVBRDF quality. Specifically, SIR is designed to learn a differentiable shadow, complemented by BRDF regularization, to optimize inverse rendering accuracy. Extensive experiments on both synthetic and real-world indoor scenes demonstrate the superior performance of SIR over existing methods in both quantitative metrics and qualitative analysis. The significant decomposing ability of SIR enables sophisticated editing capabilities like free-view relighting, object insertion, and material replacement. The code and data are available at https://xiaokangwei.github.io/SIR/.

Via

Access Paper or Ask Questions

Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object Detection without 3D Annotations

Nov 14, 2022

Shun Gui, Yan Luximon

Abstract:Heavily relying on 3D annotations limits the real-world application of 3D object detection. In this paper, we propose a method that does not demand any 3D annotation, while being able to predict full-oriented 3D bounding boxes. Our method, called Recursive Cross-View (RCV), transforms 3D detection into several 2D detection tasks, which only consume some 2D labels, based on the three-view principle. We propose a recursive paradigm, in which instance segmentation and 3D bounding box generation by Cross-View are implemented recursively until convergence. Specifically, a frustum is proposed via a 2D detector, followed by the recursive paradigm that finally outputs a full-oriented 3D box, class, and score. To justify that our method can be quickly used to new tasks in real-world scenarios, we do three experiments, namely indoor 3D human detection, full-oriented 3D hand detection, and real-time detection on a real 3D sensor. RCV achieves decent performance in these experiments. Once trained, our method can be viewed as a 3D annotation tool. Consequently, we formulate two 3D labeled dataset, namely '3D_HUMAN' and 'D_HAND', based on RCV, which could be used to pre-train other 3D detectors. Furthermore, estimated on the SUN RGB-D benchmark, our method achieves comparable performance with some full 3D supervised learning methods. RCV is the first 3D detection method that does not consume 3D labels and yields full-oriented 3D boxes on point clouds.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions