Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lingjie Liu

EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

Nov 24, 2021
Tao Hu, Kripasindhu Sarkar, Lingjie Liu, Matthias Zwicker, Christian Theobalt

Figure 1 for EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

Figure 2 for EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

Figure 3 for EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

Figure 4 for EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

We present EgoRenderer, a system for rendering full-body neural avatars of a person captured by a wearable, egocentric fisheye camera that is mounted on a cap or a VR headset. Our system renders photorealistic novel views of the actor and her motion from arbitrary virtual camera locations. Rendering full-body avatars from such egocentric images come with unique challenges due to the top-down view and large distortions. We tackle these challenges by decomposing the rendering process into several steps, including texture synthesis, pose construction, and neural image translation. For texture synthesis, we propose Ego-DPNet, a neural network that infers dense correspondences between the input fisheye images and an underlying parametric body model, and to extract textures from egocentric inputs. In addition, to encode dynamic appearances, our approach also learns an implicit texture stack that captures detailed appearance variation across poses and viewpoints. For correct pose generation, we first estimate body pose from the egocentric view using a parametric model. We then synthesize an external free-viewpoint pose image by projecting the parametric model to the user-specified target viewpoint. We next combine the target pose image and the textures into a combined feature image, which is transformed into the output color image using a neural image translation network. Experimental evaluations show that EgoRenderer is capable of generating realistic free-viewpoint avatars of a person wearing an egocentric camera. Comparisons to several baselines demonstrate the advantages of our approach.

* ICCV 2021. https://vcai.mpi-inf.mpg.de/projects/EgoRenderer/

Via

Access Paper or Ask Questions

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

Oct 18, 2021
Jiatao Gu, Lingjie Liu, Peng Wang, Christian Theobalt

Figure 1 for StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

Figure 2 for StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

Figure 3 for StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

Figure 4 for StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

We propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize high-resolution images with fine details or yield noticeable 3D-inconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a style-based generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for high-resolution image generation. We perform volume rendering only to produce a low-resolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoom-in and-out, style mixing, inversion, and semantic editing.

* 24 pages, 19 figures. Project page: http://jiataogu.me/style_nerf/

Via

Access Paper or Ask Questions

Neural Rays for Occlusion-aware Image-based Rendering

Jul 28, 2021
Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, Wenping Wang

Figure 1 for Neural Rays for Occlusion-aware Image-based Rendering

Figure 2 for Neural Rays for Occlusion-aware Image-based Rendering

Figure 3 for Neural Rays for Occlusion-aware Image-based Rendering

Figure 4 for Neural Rays for Occlusion-aware Image-based Rendering

We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis (NVS) task with multi-view images as input. Existing neural scene representations for solving the NVS problem, such as NeRF, cannot generalize to new scenes and take an excessively long time on training on each new scene from scratch. The other subsequent neural rendering methods based on stereo matching, such as PixelNeRF, SRF and IBRNet are designed to generalize to unseen scenes but suffer from view inconsistency in complex scenes with self-occlusions. To address these issues, our NeuRay method represents every scene by encoding the visibility of rays associated with the input views. This neural representation can efficiently be initialized from depths estimated by external MVS methods, which is able to generalize to new scenes and achieves satisfactory rendering images without any training on the scene. Then, the initialized NeuRay can be further optimized on every scene with little training timing to enforce spatial coherence to ensure view consistency in the presence of severe self-occlusion. Experiments demonstrate that NeuRay can quickly generate high-quality novel view images of unseen scenes with little finetuning and can handle complex scenes with severe self-occlusions which previous methods struggle with.

* 16 pages and 16 figures

Via

Access Paper or Ask Questions

Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Jul 21, 2021
Runnan Chen, Yuexin Ma, Nenglun Chen, Lingjie Liu, Zhiming Cui, Yanhong Lin, Wenping Wang

Figure 1 for Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Figure 2 for Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Figure 3 for Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Figure 4 for Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Detecting 3D landmarks on cone-beam computed tomography (CBCT) is crucial to assessing and quantifying the anatomical abnormalities in 3D cephalometric analysis. However, the current methods are time-consuming and suffer from large biases in landmark localization, leading to unreliable diagnosis results. In this work, we propose a novel Structure-Aware Long Short-Term Memory framework (SA-LSTM) for efficient and accurate 3D landmark detection. To reduce the computational burden, SA-LSTM is designed in two stages. It first locates the coarse landmarks via heatmap regression on a down-sampled CBCT volume and then progressively refines landmarks by attentive offset regression using high-resolution cropped patches. To boost accuracy, SA-LSTM captures global-local dependence among the cropping patches via self-attention. Specifically, a graph attention module implicitly encodes the landmark's global structure to rationalize the predicted position. Furthermore, a novel attention-gated module recursively filters irrelevant local features and maintains high-confident local predictions for aggregating the final result. Experiments show that our method significantly outperforms state-of-the-art methods in terms of efficiency and accuracy on an in-house dataset and a public dataset, achieving 1.64 mm and 2.37 mm average errors, respectively, and using only 0.5 seconds for inferring the whole CBCT volume of resolution 768*768*576. Moreover, all predicted landmarks are within 8 mm error, which is vital for acceptable cephalometric analysis.

Via

Access Paper or Ask Questions

NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Jun 20, 2021
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, Wenping Wang

Figure 1 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Figure 2 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Figure 3 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Figure 4 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.

* 22 pages, 17 figures

Via

Access Paper or Ask Questions

Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Jun 03, 2021
Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, Christian Theobalt

Figure 1 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Figure 2 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Figure 3 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Figure 4 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results.

Via

Access Paper or Ask Questions

Semi-supervised Anatomical Landmark Detection via Shape-regulated Self-training

May 28, 2021
Runnan Chen, Yuexin Ma, Lingjie Liu, Nenglun Chen, Zhiming Cui, Guodong Wei, Wenping Wang

Figure 1 for Semi-supervised Anatomical Landmark Detection via Shape-regulated Self-training

Figure 2 for Semi-supervised Anatomical Landmark Detection via Shape-regulated Self-training

Figure 3 for Semi-supervised Anatomical Landmark Detection via Shape-regulated Self-training

Figure 4 for Semi-supervised Anatomical Landmark Detection via Shape-regulated Self-training

Well-annotated medical images are costly and sometimes even impossible to acquire, hindering landmark detection accuracy to some extent. Semi-supervised learning alleviates the reliance on large-scale annotated data by exploiting the unlabeled data to understand the population structure of anatomical landmarks. The global shape constraint is the inherent property of anatomical landmarks that provides valuable guidance for more consistent pseudo labelling of the unlabeled data, which is ignored in the previously semi-supervised methods. In this paper, we propose a model-agnostic shape-regulated self-training framework for semi-supervised landmark detection by fully considering the global shape constraint. Specifically, to ensure pseudo labels are reliable and consistent, a PCA-based shape model adjusts pseudo labels and eliminate abnormal ones. A novel Region Attention loss to make the network automatically focus on the structure consistent regions around pseudo labels. Extensive experiments show that our approach outperforms other semi-supervised methods and achieves notable improvement on three medical image datasets. Moreover, our framework is flexible and can be used as a plug-and-play module integrated into most supervised methods to improve performance further.

Via

Access Paper or Ask Questions

Real-time Deep Dynamic Characters

May 04, 2021
Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt

Figure 1 for Real-time Deep Dynamic Characters

Figure 2 for Real-time Deep Dynamic Characters

Figure 3 for Real-time Deep Dynamic Characters

Figure 4 for Real-time Deep Dynamic Characters

We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance learned in a new weakly supervised way from multi-view imagery. In contrast to previous work, our controllable 3D character displays dynamics, e.g., the swing of the skirt, dependent on skeletal body motion in an efficient data-driven way, without requiring complex physics simulation. Our character model also features a learned dynamic texture model that accounts for photo-realistic motion-dependent appearance details, as well as view-dependent lighting effects. During training, we do not need to resort to difficult dynamic 3D capture of the human; instead we can train our model entirely from multi-view video in a weakly supervised manner. To this end, we propose a parametric and differentiable character representation which allows us to model coarse and fine dynamic deformations, e.g., garment wrinkles, as explicit space-time coherent mesh geometry that is augmented with high-quality dynamic textures dependent on motion and view point. As input to the model, only an arbitrary 3D skeleton motion is required, making it directly compatible with the established 3D animation pipeline. We use a novel graph convolutional network architecture to enable motion-dependent deformation learning of body and clothing, including dynamics, and a neural generative dynamic texture model creates corresponding dynamic texture maps. We show that by merely providing new skeletal motions, our model creates motion-dependent surface deformations, physically plausible dynamic clothing deformations, as well as video-realistic surface textures at a much higher level of detail than previous state of the art approaches, and even in real-time.

Via

Access Paper or Ask Questions

Estimating Egocentric 3D Human Pose in Global Space

Apr 30, 2021
Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Christian Theobalt

Figure 1 for Estimating Egocentric 3D Human Pose in Global Space

Figure 2 for Estimating Egocentric 3D Human Pose in Global Space

Figure 3 for Estimating Egocentric 3D Human Pose in Global Space

Figure 4 for Estimating Egocentric 3D Human Pose in Global Space

Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for traditional outside-in motion capture with external cameras. However, existing methods have several limitations. A prominent problem is that the estimated poses lie in the local coordinate system of the fisheye camera, rather than in the world coordinate system, which is restrictive for many applications. Furthermore, these methods suffer from limited accuracy and temporal instability due to ambiguities caused by the monocular setup and the severe occlusion in a strongly distorted egocentric perspective. To tackle these limitations, we present a new method for egocentric global 3D body pose estimation using a single head-mounted fisheye camera. To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset. Experimental results show that our approach outperforms state-of-the-art methods both quantitatively and qualitatively.

Via

Access Paper or Ask Questions

Efficient and Differentiable Shadow Computation for Inverse Problems

Apr 01, 2021
Linjie Lyu, Marc Habermann, Lingjie Liu, Mallikarjun B R, Ayush Tewari, Christian Theobalt

Figure 1 for Efficient and Differentiable Shadow Computation for Inverse Problems

Figure 2 for Efficient and Differentiable Shadow Computation for Inverse Problems

Figure 3 for Efficient and Differentiable Shadow Computation for Inverse Problems

Figure 4 for Efficient and Differentiable Shadow Computation for Inverse Problems

Differentiable rendering has received increasing interest for image-based inverse problems. It can benefit traditional optimization-based solutions to inverse problems, but also allows for self-supervision of learning-based approaches for which training data with ground truth annotation is hard to obtain. However, existing differentiable renderers either do not model visibility of the light sources from the different points in the scene, responsible for shadows in the images, or are too slow for being used to train deep architectures over thousands of iterations. To this end, we propose an accurate yet efficient approach for differentiable visibility and soft shadow computation. Our approach is based on the spherical harmonics approximations of the scene illumination and visibility, where the occluding surface is approximated with spheres. This allows for a significantly more efficient shadow computation compared to methods based on ray tracing. As our formulation is differentiable, it can be used to solve inverse problems such as texture, illumination, rigid pose, and geometric deformation recovery from images using analysis-by-synthesis optimization.

Via

Access Paper or Ask Questions