Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Numair Khan

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Jun 11, 2025

Chieh Hubert Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang(+4 more)

Abstract:We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian representation that is easy to learn, supports high-quality dynamic view synthesis, and enables long-range 3D tracking; and a large transformer network that achieves real-time, generalizable dynamic scene reconstruction. Extensive qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the state-of-the-art predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can readily adapt for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods.

* Project page: https://hubert0527.github.io/dgslrm/

Via

Access Paper or Ask Questions

Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency

May 25, 2025

Hyunho Ha, Lei Xiao, Christian Richardt, Thu Nguyen-Phuoc, Changil Kim, Min H. Kim, Douglas Lanman, Numair Khan

Abstract:We introduce a novel geometry-guided online video view synthesis method with enhanced view and temporal consistency. Traditional approaches achieve high-quality synthesis from dense multi-view camera setups but require significant computational resources. In contrast, selective-input methods reduce this cost but often compromise quality, leading to multi-view and temporal inconsistencies such as flickering artifacts. Our method addresses this challenge to deliver efficient, high-quality novel-view synthesis with view and temporal consistency. The key innovation of our approach lies in using global geometry to guide an image-based rendering pipeline. To accomplish this, we progressively refine depth maps using color difference masks across time. These depth maps are then accumulated through truncated signed distance fields in the synthesized view's image space. This depth representation is view and temporally consistent, and is used to guide a pre-trained blending network that fuses multiple forward-rendered input-view images. Thus, the network is encouraged to output geometrically consistent synthesis results across multiple views and time. Our approach achieves consistent, high-quality video synthesis, while running efficiently in an online manner.

* Accepted by CVPR 2025. Project website: https://nkhan2.github.io/projects/geometry-guided-2025/index.html

Via

Access Paper or Ask Questions

ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

Jan 31, 2024

Edward Bartrum, Thu Nguyen-Phuoc, Chris Xie, Zhengqin Li, Numair Khan, Armen Avetisyan, Douglas Lanman, Lei Xiao

Abstract:We introduce ReplaceAnything3D model (RAM3D), a novel text-guided 3D scene editing method that enables the replacement of specific objects within a scene. Given multi-view images of a scene, a text prompt describing the object to replace, and a text prompt describing the new object, our Erase-and-Replace approach can effectively swap objects in the scene with newly generated content while maintaining 3D consistency across multiple viewpoints. We demonstrate the versatility of ReplaceAnything3D by applying it to various realistic 3D scenes, showcasing results of modified foreground objects that are well-integrated with the rest of the scene without affecting its overall integrity.

* For our project page, see https://replaceanything3d.github.io/

Via

Access Paper or Ask Questions

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Jan 17, 2024

Yu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong(+1 more)

Figure 1 for TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Figure 2 for TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Figure 3 for TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Figure 4 for TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Abstract:We present TextureDreamer, a novel image-guided texture synthesis method to transfer relightable textures from a small number of input images (3 to 5) to target 3D shapes across arbitrary categories. Texture creation is a pivotal challenge in vision and graphics. Industrial companies hire experienced artists to manually craft textures for 3D assets. Classical methods require densely sampled views and accurately aligned geometry, while learning-based methods are confined to category-specific shapes within the dataset. In contrast, TextureDreamer can transfer highly detailed, intricate textures from real-world environments to arbitrary objects with only a few casually captured images, potentially significantly democratizing texture creation. Our core idea, personalized geometry-aware score distillation (PGSD), draws inspiration from recent advancements in diffuse models, including personalized modeling for texture information extraction, variational score distillation for detailed appearance synthesis, and explicit geometry guidance with ControlNet. Our integration and several essential modifications substantially improve the texture quality. Experiments on real images spanning different categories show that TextureDreamer can successfully transfer highly realistic, semantic meaningful texture to arbitrary objects, surpassing the visual quality of previous state-of-the-art.

* Project page: https://texturedreamer.github.io

Via

Access Paper or Ask Questions

GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Dec 18, 2023

Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao

Figure 1 for GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Figure 2 for GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Figure 3 for GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Figure 4 for GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Abstract:We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. Project website: https://lynl7130.github.io/gaufre/index.html

* 10 pages, 8 figures, 4 tables

Via

Access Paper or Ask Questions

Tiled Multiplane Images for Practical 3D Photography

Sep 25, 2023

Numair Khan, Douglas Lanman, Lei Xiao

Figure 1 for Tiled Multiplane Images for Practical 3D Photography

Figure 2 for Tiled Multiplane Images for Practical 3D Photography

Figure 3 for Tiled Multiplane Images for Practical 3D Photography

Figure 4 for Tiled Multiplane Images for Practical 3D Photography

Abstract:The task of synthesizing novel views from a single image has useful applications in virtual reality and mobile computing, and a number of approaches to the problem have been proposed in recent years. A Multiplane Image (MPI) estimates the scene as a stack of RGBA layers, and can model complex appearance effects, anti-alias depth errors and synthesize soft edges better than methods that use textured meshes or layered depth images. And unlike neural radiance fields, an MPI can be efficiently rendered on graphics hardware. However, MPIs are highly redundant and require a large number of depth layers to achieve plausible results. Based on the observation that the depth complexity in local image regions is lower than that over the entire image, we split an MPI into many small, tiled regions, each with only a few depth planes. We call this representation a Tiled Multiplane Image (TMPI). We propose a method for generating a TMPI with adaptive depth planes for single-view 3D photography in the wild. Our synthesized results are comparable to state-of-the-art single-view MPI methods while having lower computational overhead.

* ICCV 2023

Via

Access Paper or Ask Questions

Temporally Consistent Online Depth Estimation Using Point-Based Fusion

May 01, 2023

Numair Khan, Eric Penner, Douglas Lanman, Lei Xiao

Abstract:Depth estimation is an important step in many computer vision problems such as 3D reconstruction, novel view synthesis, and computational photography. Most existing work focuses on depth estimation from single frames. When applied to videos, the result lacks temporal consistency, showing flickering and swimming artifacts. In this paper we aim to estimate temporally consistent depth maps of video streams in an online setting. This is a difficult problem as future frames are not available and the method must choose between enforcing consistency and correcting errors from previous estimations. The presence of dynamic objects further complicates the problem. We propose to address these challenges by using a global point cloud that is dynamically updated each frame, along with a learned fusion approach in image space. Our approach encourages consistency while simultaneously allowing updates to handle errors and dynamic objects. Qualitative and quantitative results show that our method achieves state-of-the-art quality for consistent video depth estimation.

* CVPR 2023
* Supplementary video at https://research.facebook.com/publications/temporally-consistent-online-depth-estimation-using-point-based-fusion/

Via

Access Paper or Ask Questions

Neural Fields in Visual Computing and Beyond

Nov 29, 2021

Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, Srinath Sridhar

Figure 1 for Neural Fields in Visual Computing and Beyond

Figure 2 for Neural Fields in Visual Computing and Beyond

Figure 3 for Neural Fields in Visual Computing and Beyond

Figure 4 for Neural Fields in Visual Computing and Beyond

Abstract:Recent advances in machine learning have created increasing interest in solving visual computing problems using a class of coordinate-based neural networks that parametrize physical properties of scenes or objects across space and time. These methods, which we call neural fields, have seen successful application in the synthesis of 3D shapes and image, animation of human bodies, 3D reconstruction, and pose estimation. However, due to rapid progress in a short time, many papers exist but a comprehensive review and formulation of the problem has not yet emerged. In this report, we address this limitation by providing context, mathematical grounding, and an extensive review of literature on neural fields. This report covers research along two dimensions. In Part I, we focus on techniques in neural fields by identifying common components of neural field methods, including different representations, architectures, forward mapping, and generalization methods. In Part II, we focus on applications of neural fields to different problems in visual computing, and beyond (e.g., robotics, audio). Our review shows the breadth of topics already covered in visual computing, both historically and in current incarnations, demonstrating the improved quality, flexibility, and capability brought by neural fields methods. Finally, we present a companion website that contributes a living version of this review that can be continually updated by the community.

* Equal advising: Vincent Sitzmann and Srinath Sridhar

Via

Access Paper or Ask Questions

Edge-aware Bidirectional Diffusion for Dense Depth Estimation from Light Fields

Jul 07, 2021

Numair Khan, Min H. Kim, James Tompkin

Figure 1 for Edge-aware Bidirectional Diffusion for Dense Depth Estimation from Light Fields

Figure 2 for Edge-aware Bidirectional Diffusion for Dense Depth Estimation from Light Fields

Figure 3 for Edge-aware Bidirectional Diffusion for Dense Depth Estimation from Light Fields

Figure 4 for Edge-aware Bidirectional Diffusion for Dense Depth Estimation from Light Fields

Abstract:We present an algorithm to estimate fast and accurate depth maps from light fields via a sparse set of depth edges and gradients. Our proposed approach is based around the idea that true depth edges are more sensitive than texture edges to local constraints, and so they can be reliably disambiguated through a bidirectional diffusion process. First, we use epipolar-plane images to estimate sub-pixel disparity at a sparse set of pixels. To find sparse points efficiently, we propose an entropy-based refinement approach to a line estimate from a limited set of oriented filter banks. Next, to estimate the diffusion direction away from sparse points, we optimize constraints at these points via our bidirectional diffusion method. This resolves the ambiguity of which surface the edge belongs to and reliably separates depth from texture edges, allowing us to diffuse the sparse set in a depth-edge and occlusion-aware manner to obtain accurate dense depth maps.

* Project webpage: http://visual.cs.brown.edu/lightfielddepth

Via

Access Paper or Ask Questions

Differentiable Diffusion for Dense Depth Estimation from Multi-view Images

Jun 29, 2021

Numair Khan, Min H. Kim, James Tompkin

Figure 1 for Differentiable Diffusion for Dense Depth Estimation from Multi-view Images

Figure 2 for Differentiable Diffusion for Dense Depth Estimation from Multi-view Images

Figure 3 for Differentiable Diffusion for Dense Depth Estimation from Multi-view Images

Figure 4 for Differentiable Diffusion for Dense Depth Estimation from Multi-view Images

Abstract:We present a method to estimate dense depth by optimizing a sparse set of points such that their diffusion into a depth map minimizes a multi-view reprojection error from RGB supervision. We optimize point positions, depths, and weights with respect to the loss by differential splatting that models points as Gaussians with analytic transmittance. Further, we develop an efficient optimization routine that can simultaneously optimize the 50k+ points required for complex scene reconstruction. We validate our routine using ground truth data and show high reconstruction quality. Then, we apply this to light field and wider baseline images via self supervision, and show improvements in both average and outlier error for depth maps diffused from inaccurate sparse points. Finally, we compare qualitative and quantitative results to image processing and deep learning methods. http://visual.cs.brown.edu/diffdiffdepth

* CVPR 2021

Via

Access Paper or Ask Questions