Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Juyong Zhang

Oblique-MERF: Revisiting and Improving MERF for Oblique Photography

Apr 15, 2024

Xiaoyi Zeng, Kaiwen Song, Leyuan Yang, Bailin Deng, Juyong Zhang

Figure 1 for Oblique-MERF: Revisiting and Improving MERF for Oblique Photography

Figure 2 for Oblique-MERF: Revisiting and Improving MERF for Oblique Photography

Figure 3 for Oblique-MERF: Revisiting and Improving MERF for Oblique Photography

Figure 4 for Oblique-MERF: Revisiting and Improving MERF for Oblique Photography

Abstract:Neural implicit fields have established a new paradigm for scene representation, with subsequent work achieving high-quality real-time rendering. However, reconstructing 3D scenes from oblique aerial photography presents unique challenges, such as varying spatial scale distributions and a constrained range of tilt angles, often resulting in high memory consumption and reduced rendering quality at extrapolated viewpoints. In this paper, we enhance MERF to accommodate these data characteristics by introducing an innovative adaptive occupancy plane optimized during the volume rendering process and a smoothness regularization term for view-dependent color to address these issues. Our approach, termed Oblique-MERF, surpasses state-of-the-art real-time methods by approximately 0.7 dB, reduces VRAM usage by about 40%, and achieves higher rendering frame rates with more realistic rendering outcomes across most viewpoints.

Via

Access Paper or Ask Questions

Neural-ABC: Neural Parametric Models for Articulated Body with Clothes

Apr 06, 2024

Honghu Chen, Yuxin Yao, Juyong Zhang

Abstract:In this paper, we introduce Neural-ABC, a novel parametric model based on neural implicit functions that can represent clothed human bodies with disentangled latent spaces for identity, clothing, shape, and pose. Traditional mesh-based representations struggle to represent articulated bodies with clothes due to the diversity of human body shapes and clothing styles, as well as the complexity of poses. Our proposed model provides a unified framework for parametric modeling, which can represent the identity, clothing, shape and pose of the clothed human body. Our proposed approach utilizes the power of neural implicit functions as the underlying representation and integrates well-designed structures to meet the necessary requirements. Specifically, we represent the underlying body as a signed distance function and clothing as an unsigned distance function, and they can be uniformly represented as unsigned distance fields. Different types of clothing do not require predefined topological structures or classifications, and can follow changes in the underlying body to fit the body. Additionally, we construct poses using a controllable articulated structure. The model is trained on both open and newly constructed datasets, and our decoupling strategy is carefully designed to ensure optimal performance. Our model excels at disentangling clothing and identity in different shape and poses while preserving the style of the clothing. We demonstrate that Neural-ABC fits new observations of different types of clothing. Compared to other state-of-the-art parametric models, Neural-ABC demonstrates powerful advantages in the reconstruction of clothed human bodies, as evidenced by fitting raw scans, depth maps and images. We show that the attributes of the fitted results can be further edited by adjusting their identities, clothing, shape and pose codes.

* Accepted by IEEE Transactions on Visualization and Computer Graphics. Project page: https://ustc3dv.github.io/NeuralABC/

Via

Access Paper or Ask Questions

DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

Mar 18, 2024

Yuxin Yao, Siyu Ren, Junhui Hou, Zhi Deng, Juyong Zhang, Wenping Wang

Figure 1 for DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

Figure 2 for DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

Figure 3 for DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

Figure 4 for DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

Abstract:This paper explores the problem of reconstructing temporally consistent surfaces from a 3D point cloud sequence without correspondence. To address this challenging task, we propose DynoSurf, an unsupervised learning framework integrating a template surface representation with a learnable deformation field. Specifically, we design a coarse-to-fine strategy for learning the template surface based on the deformable tetrahedron representation. Furthermore, we propose a learnable deformation representation based on the learnable control points and blending weights, which can deform the template surface non-rigidly while maintaining the consistency of the local shape. Experimental results demonstrate the significant superiority of DynoSurf over current state-of-the-art approaches, showcasing its potential as a powerful tool for dynamic mesh reconstruction. The code is publicly available at https://github.com/yaoyx689/DynoSurf.

Via

Access Paper or Ask Questions

City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Dec 27, 2023

Kaiwen Song, Juyong Zhang

Figure 1 for City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Figure 2 for City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Figure 3 for City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Figure 4 for City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Abstract:NeRF has significantly advanced 3D scene reconstruction, capturing intricate details across various environments. Existing methods have successfully leveraged radiance field baking to facilitate real-time rendering of small scenes. However, when applied to large-scale scenes, these techniques encounter significant challenges, struggling to provide a seamless real-time experience due to limited resources in computation, memory, and bandwidth. In this paper, we propose City-on-Web, which represents the whole scene by partitioning it into manageable blocks, each with its own Level-of-Detail, ensuring high fidelity, efficient memory management and fast rendering. Meanwhile, we carefully design the training and inference process such that the final rendering result on web is consistent with training. Thanks to our novel representation and carefully designed training/inference process, we are the first to achieve real-time rendering of large-scale scenes in resource-constrained environments. Extensive experimental results demonstrate that our method facilitates real-time rendering of large-scale scenes on a web platform, achieving 32FPS at 1080P resolution with an RTX 3060 GPU, while simultaneously achieving a quality that closely rivals that of state-of-the-art methods. Project page: https://ustc3dv.github.io/City-on-Web/

* Project page: https://ustc3dv.github.io/City-on-Web/

Via

Access Paper or Ask Questions

HeadRecon: High-Fidelity 3D Head Reconstruction from Monocular Video

Dec 14, 2023

Xueying Wang, Juyong Zhang

Figure 1 for HeadRecon: High-Fidelity 3D Head Reconstruction from Monocular Video

Figure 2 for HeadRecon: High-Fidelity 3D Head Reconstruction from Monocular Video

Figure 3 for HeadRecon: High-Fidelity 3D Head Reconstruction from Monocular Video

Figure 4 for HeadRecon: High-Fidelity 3D Head Reconstruction from Monocular Video

Abstract:Recently, the reconstruction of high-fidelity 3D head models from static portrait image has made great progress. However, most methods require multi-view or multi-illumination information, which therefore put forward high requirements for data acquisition. In this paper, we study the reconstruction of high-fidelity 3D head models from arbitrary monocular videos. Non-rigid structure from motion (NRSFM) methods have been widely used to solve such problems according to the two-dimensional correspondence between different frames. However, the inaccurate correspondence caused by high-complex hair structures and various facial expression changes would heavily influence the reconstruction accuracy. To tackle these problems, we propose a prior-guided dynamic implicit neural network. Specifically, we design a two-part dynamic deformation field to transform the current frame space to the canonical one. We further model the head geometry in the canonical space with a learnable signed distance field (SDF) and optimize it using the volumetric rendering with the guidance of two-main head priors to improve the reconstruction accuracy and robustness. Extensive ablation studies and comparisons with state-of-the-art methods demonstrate the effectiveness and robustness of our proposed method.

Via

Access Paper or Ask Questions

FlashAvatar: High-Fidelity Digital Avatar Rendering at 300FPS

Dec 03, 2023

Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang

Abstract:We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. Project page: https://ustc3dv.github.io/FlashAvatar/

* Project page: https://ustc3dv.github.io/FlashAvatar/

Via

Access Paper or Ask Questions

CosAvatar: Consistent and Animatable Portrait Video Tuning with Text Prompt

Nov 30, 2023

Haiyao Xiao, Chenglai Zhong, Xuan Gao, Yudong Guo, Juyong Zhang

Abstract:Recently, text-guided digital portrait editing has attracted more and more attentions. However, existing methods still struggle to maintain consistency across time, expression, and view or require specific data prerequisites. To solve these challenging problems, we propose CosAvatar, a high-quality and user-friendly framework for portrait tuning. With only monocular video and text instructions as input, we can produce animatable portraits with both temporal and 3D consistency. Different from methods that directly edit in the 2D domain, we employ a dynamic NeRF-based 3D portrait representation to model both the head and torso. We alternate between editing the video frames' dataset and updating the underlying 3D portrait until the edited frames reach 3D consistency. Additionally, we integrate the semantic portrait priors to enhance the edited results, allowing precise modifications in specified semantic areas. Extensive results demonstrate that our proposed method can not only accurately edit portrait styles or local attributes based on text instructions but also support expressive animation driven by a source video.

* Project page: https://ustc3dv.github.io/CosAvatar/

Via

Access Paper or Ask Questions

$L_0$-Sampler: An $L_{0}$ Model Guided Volume Sampling for NeRF

Nov 13, 2023

Liangchen Li, Juyong Zhang

Abstract:Since being proposed, Neural Radiance Fields (NeRF) have achieved great success in related tasks, mainly adopting the hierarchical volume sampling (HVS) strategy for volume rendering. However, the HVS of NeRF approximates distributions using piecewise constant functions, which provides a relatively rough estimation. Based on the observation that a well-trained weight function $w(t)$ and the $L_0$ distance between points and the surface have very high similarity, we propose $L_0$-Sampler by incorporating the $L_0$ model into $w(t)$ to guide the sampling process. Specifically, we propose to use piecewise exponential functions rather than piecewise constant functions for interpolation, which can not only approximate quasi-$L_0$ weight distributions along rays quite well but also can be easily implemented with few lines of code without additional computational burden. Stable performance improvements can be achieved by applying $L_0$-Sampler to NeRF and its related tasks like 3D reconstruction. Code is available at https://ustc3dv.github.io/L0-Sampler/ .

* Project page: https://ustc3dv.github.io/L0-Sampler/

Via

Access Paper or Ask Questions

MetaHead: An Engine to Create Realistic Digital Head

Apr 03, 2023

Dingyun Zhang, Chenglai Zhong, Yudong Guo, Yang Hong, Juyong Zhang

Figure 1 for MetaHead: An Engine to Create Realistic Digital Head

Figure 2 for MetaHead: An Engine to Create Realistic Digital Head

Figure 3 for MetaHead: An Engine to Create Realistic Digital Head

Figure 4 for MetaHead: An Engine to Create Realistic Digital Head

Abstract:Collecting and labeling training data is one important step for learning-based methods because the process is time-consuming and biased. For face analysis tasks, although some generative models can be used to generate face data, they can only achieve a subset of generation diversity, reconstruction accuracy, 3D consistency, high-fidelity visual quality, and easy editability. One recent related work is the graphics-based generative method, but it can only render low realism head with high computation cost. In this paper, we propose MetaHead, a unified and full-featured controllable digital head engine, which consists of a controllable head radiance field(MetaHead-F) to super-realistically generate or reconstruct view-consistent 3D controllable digital heads and a generic top-down image generation framework LabelHead to generate digital heads consistent with the given customizable feature labels. Experiments validate that our controllable digital head engine achieves the state-of-the-art generation visual quality and reconstruction accuracy. Moreover, the generated labeled data can assist real training data and significantly surpass the labeled data generated by graphics-based methods in terms of training effect.

* Project page: https://ustc3dv.github.io/MetaHead/

Via

Access Paper or Ask Questions

IntrinsicNGP: Intrinsic Coordinate based Hash Encoding for Human NeRF

Mar 09, 2023

Bo Peng, Jun Hu, Jingtao Zhou, Xuan Gao, Juyong Zhang

Abstract:Recently, many works have been proposed to utilize the neural radiance field for novel view synthesis of human performers. However, most of these methods require hours of training, making them difficult for practical use. To address this challenging problem, we propose IntrinsicNGP, which can train from scratch and achieve high-fidelity results in few minutes with videos of a human performer. To achieve this target, we introduce a continuous and optimizable intrinsic coordinate rather than the original explicit Euclidean coordinate in the hash encoding module of instant-NGP. With this novel intrinsic coordinate, IntrinsicNGP can aggregate inter-frame information for dynamic objects with the help of proxy geometry shapes. Moreover, the results trained with the given rough geometry shapes can be further refined with an optimizable offset field based on the intrinsic coordinate.Extensive experimental results on several datasets demonstrate the effectiveness and efficiency of IntrinsicNGP. We also illustrate our approach's ability to edit the shape of reconstructed subjects.

* Project page:https://ustc3dv.github.io/IntrinsicNGP/. arXiv admin note: substantial text overlap with arXiv:2210.01651

Via

Access Paper or Ask Questions