Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongki Jung

UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting

Jun 05, 2025

Jaehoon Choi, Dongki Jung, Christopher Maxey, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon

Abstract:Despite significant advancements in dynamic neural rendering, existing methods fail to address the unique challenges posed by UAV-captured scenarios, particularly those involving monocular camera setups, top-down perspective, and multiple small, moving humans, which are not adequately represented in existing datasets. In this work, we introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs. Specifically, we address the challenge of reconstructing dynamic scenes with multiple moving pedestrians from monocular video data without the need for additional sensors. We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans. We propose a novel approach to resolve the scene scale ambiguity and place both humans and the scene in world coordinates by identifying human-scene contact points. Additionally, we exploit the SMPL model and background mesh to initialize Gaussian splats, enabling holistic scene rendering. We evaluated our method on three complex UAV-captured datasets: VisDrone, Manipal-UAV, and Okutama-Action, each with distinct characteristics and 10~50 humans. Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.

Via

Access Paper or Ask Questions

UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting

Apr 02, 2025

Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon

Figure 1 for UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting

Figure 2 for UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting

Figure 3 for UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting

Figure 4 for UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting

Abstract:We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs). Specifically, our approach focuses on synthesizing foreground components, such as various human instances in motion within complex scene backgrounds, from UAV perspectives. This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses. To the best of our knowledge, UAVTwin is the first approach for UAV-based perception that is capable of generating high-fidelity digital twins based on 3DGS. The proposed work significantly enhances downstream models through data augmentation for real-world environments with multiple dynamic objects and significant appearance variations-both of which typically introduce artifacts in 3DGS-based modeling. To tackle these challenges, we propose a novel appearance modeling strategy and a mask refinement module to enhance the training of 3D Gaussian Splatting. We demonstrate the high quality of neural rendering by achieving a 1.23 dB improvement in PSNR compared to recent methods. Furthermore, we validate the effectiveness of data augmentation by showing a 2.5% to 13.7% improvement in mAP for the human detection task.

Via

Access Paper or Ask Questions

EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

Feb 28, 2025

Dongki Jung, Jaehoon Choi, Yonghan Lee, Somi Jeong, Taejae Lee, Dinesh Manocha, Suyong Yeon

Figure 1 for EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

Figure 2 for EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

Figure 3 for EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

Figure 4 for EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

Abstract:We introduce the first learning-based dense matching algorithm, termed Equirectangular Projection-Oriented Dense Kernelized Feature Matching (EDM), specifically designed for omnidirectional images. Equirectangular projection (ERP) images, with their large fields of view, are particularly suited for dense matching techniques that aim to establish comprehensive correspondences across images. However, ERP images are subject to significant distortions, which we address by leveraging the spherical camera model and geodesic flow refinement in the dense matching method. To further mitigate these distortions, we propose spherical positional embeddings based on 3D Cartesian coordinates of the feature grid. Additionally, our method incorporates bidirectional transformations between spherical and Cartesian coordinate systems during refinement, utilizing a unit sphere to improve matching performance. We demonstrate that our proposed method achieves notable performance enhancements, with improvements of +26.72 and +42.62 in AUC@5{\deg} on the Matterport3D and Stanford2D3D datasets.

Via

Access Paper or Ask Questions

IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras

Feb 19, 2025

Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha

$Figure 1 for IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras$

$Figure 2 for IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras$

$Figure 3 for IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras$

$Figure 4 for IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras$

Abstract:We present a novel 3D reconstruction pipeline for 360$^\circ$ cameras for 3D mapping and rendering of indoor environments. Traditional Structure-from-Motion (SfM) methods may not work well in large-scale indoor scenes due to the prevalence of textureless and repetitive regions. To overcome these challenges, our approach (IM360) leverages the wide field of view of omnidirectional images and integrates the spherical camera model into every core component of the SfM pipeline. In order to develop a comprehensive 3D reconstruction solution, we integrate a neural implicit surface reconstruction technique to generate high-quality surfaces from sparse input data. Additionally, we utilize a mesh-based neural rendering approach to refine texture maps and accurately capture view-dependent properties by combining diffuse and specular components. We evaluate our pipeline on large-scale indoor scenes from the Matterport3D and Stanford2D3D datasets. In practice, IM360 demonstrate superior performance in terms of textured mesh reconstruction over SOTA. We observe accuracy improvements in terms of camera localization and registration as well as rendering high frequency details.

Via

Access Paper or Ask Questions

Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

Oct 06, 2024

Yonghan Lee, Jaehoon Choi, Dongki Jung, Jaeseong Yun, Soohyun Ryu, Dinesh Manocha, Suyong Yeon

Abstract:We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.

Via

Access Paper or Ask Questions

TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Mar 27, 2023

Jaehoon Choi, Dongki Jung, Taejae Lee, Sangwook Kim, Youngdong Jung, Dinesh Manocha, Donghwan Lee

Figure 1 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Figure 2 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Figure 3 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Figure 4 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Abstract:We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, we apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. Our pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images. We demonstrate results of captured objects with complex shapes and validate our method numerically against existing 3D reconstruction and texture mapping methods.

* Accepted to CVPR23. Project Page: https://jh-choi.github.io/TMO/

Via

Access Paper or Ask Questions

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Mar 10, 2022

Jaehoon Choi, Dongki Jung, Yonghan Lee, Deokhwa Kim, Dinesh Manocha, Donghwan Lee

Figure 1 for SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Figure 2 for SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Figure 3 for SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Figure 4 for SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Abstract:Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and mapping (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is based on a teacher-student formulation which guides our network to predict high-quality depths. We demonstrate that our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments. Our full system shows improvements over recent self-supervised depth estimation and completion methods on EuRoC, OpenLORIS, and ScanNet datasets.

Via

Access Paper or Ask Questions

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Aug 12, 2021

Dongki Jung, Jaehoon Choi, Yonghan Lee, Deokhwa Kim, Changick Kim, Dinesh Manocha, Donghwan Lee

Figure 1 for DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Figure 2 for DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Figure 3 for DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Figure 4 for DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Abstract:We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes.

Via

Access Paper or Ask Questions

SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Nov 10, 2020

Jaehoon Choi, Dongki Jung, Yonghan Lee, Deokhwa Kim, Dinesh Manocha, Donghwan Lee

Figure 1 for SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Figure 2 for SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Figure 3 for SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Figure 4 for SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Abstract:We present a novel algorithm for self-supervised monocular depth completion. Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels. Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions. Our novel architecture leverages both deep stacks of sparse convolution blocks to extract sparse depth features and pixel-adaptive convolutions to fuse image and depth features. We compare with existing approaches in NYUv2, KITTI and NAVERLABS indoor datasets, and observe 5\:-\:34 \% improvements in root-means-square error (RMSE) reduction.

Via

Access Paper or Ask Questions

Self-supervised Monocular Depth Estimation with Semantic-aware Depth Features

Oct 13, 2020

Jaehoon Choi, Dongki Jung, Donghwan Lee, Changick Kim

Figure 1 for Self-supervised Monocular Depth Estimation with Semantic-aware Depth Features

Figure 2 for Self-supervised Monocular Depth Estimation with Semantic-aware Depth Features

Figure 3 for Self-supervised Monocular Depth Estimation with Semantic-aware Depth Features

Figure 4 for Self-supervised Monocular Depth Estimation with Semantic-aware Depth Features

Abstract:Self-supervised monocular depth estimation has emerged as a promising method because it does not require groundtruth depth maps during training. As an alternative for the groundtruth depth map, the photometric loss enables to provide self-supervision on depth prediction by matching the input image frames. However, the photometric loss causes various problems, resulting in less accurate depth values compared with supervised approaches. In this paper, we propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss. Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge. Therefore, we introduce multi-task learning schemes to incorporate semantic-awareness into the representation of depth features. Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods. Furthermore, extensive experiments on different datasets show its better generalization ability and robustness to various conditions, such as low-light or adverse weather.

Via

Access Paper or Ask Questions