Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christian Theobalt

Max Planck Institute for Informatics, Saarland Informatics Campus

RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Jun 22, 2021

Jiayi Wang, Franziska Mueller, Florian Bernard, Suzanne Sorli, Oleksandr Sotnychenko, Neng Qian, Miguel A. Otaduy, Dan Casas, Christian Theobalt

Figure 1 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Figure 2 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Figure 3 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Figure 4 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Abstract:Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (e.g., considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed intra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.

* ACM Transactions on Graphics (TOG) 39 (6), 1-16, 2020
* SIGGRAPH Asia 2020

Via

Access Paper or Ask Questions

Fast Simultaneous Gravitational Alignment of Multiple Point Sets

Jun 21, 2021

Vladislav Golyanik, Soshi Shimada, Christian Theobalt

Figure 1 for Fast Simultaneous Gravitational Alignment of Multiple Point Sets

Figure 2 for Fast Simultaneous Gravitational Alignment of Multiple Point Sets

Figure 3 for Fast Simultaneous Gravitational Alignment of Multiple Point Sets

Figure 4 for Fast Simultaneous Gravitational Alignment of Multiple Point Sets

Abstract:The problem of simultaneous rigid alignment of multiple unordered point sets which is unbiased towards any of the inputs has recently attracted increasing interest, and several reliable methods have been newly proposed. While being remarkably robust towards noise and clustered outliers, current approaches require sophisticated initialisation schemes and do not scale well to large point sets. This paper proposes a new resilient technique for simultaneous registration of multiple point sets by interpreting the latter as particle swarms rigidly moving in the mutually induced force fields. Thanks to the improved simulation with altered physical laws and acceleration of globally multiply-linked point interactions with a 2^D-tree (D is the space dimensionality), our Multi-Body Gravitational Approach (MBGA) is robust to noise and missing data while supporting more massive point sets than previous methods (with 10^5 points and more). In various experimental settings, MBGA is shown to outperform several baseline point set alignment approaches in terms of accuracy and runtime. We make our source code available for the community to facilitate the reproducibility of the results.

* 3DV 2020
* Project webpage: http://gvv.mpi-inf.mpg.de/projects/MBGA/

Via

Access Paper or Ask Questions

NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Jun 20, 2021

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, Wenping Wang

Figure 1 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Figure 2 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Figure 3 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Figure 4 for NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Abstract:We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.

* 22 pages, 17 figures

Via

Access Paper or Ask Questions

Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Jun 15, 2021

Franziska Mueller, Micah Davis, Florian Bernard, Oleksandr Sotnychenko, Mickeal Verschoor, Miguel A. Otaduy, Dan Casas, Christian Theobalt

Figure 1 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Figure 2 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Figure 3 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Figure 4 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Abstract:We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands. Our approach is the first two-hand tracking solution that combines an extensive list of favorable properties, namely it is marker-less, uses a single consumer-level depth camera, runs in real time, handles inter- and intra-hand collisions, and automatically adjusts to the user's hand shape. In order to achieve this, we embed a recent parametric hand pose and shape model and a dense correspondence predictor based on a deep neural network into a suitable energy minimization framework. For training the correspondence prediction network, we synthesize a two-hand dataset based on physical simulations that includes both hand pose and shape annotations while at the same time avoiding inter-hand penetrations. To achieve real-time rates, we phrase the model fitting in terms of a nonlinear least-squares problem so that the energy can be optimized based on a highly efficient GPU-based Gauss-Newton optimizer. We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work, including tight two-hand grasps, significant inter-hand occlusions, and gesture interaction.

* ACM Transactions on Graphics (Proceedings SIGGRAPH 2019)

Via

Access Paper or Ask Questions

Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Jun 03, 2021

Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, Christian Theobalt

Figure 1 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Figure 2 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Figure 3 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Figure 4 for Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Abstract:We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results.

Via

Access Paper or Ask Questions

Q-Match: Iterative Shape Matching via Quantum Annealing

May 06, 2021

Marcel Seelbach Benkner, Zorah Lähner, Vladislav Golyanik, Christof Wunderlich, Christian Theobalt, Michael Moeller

Figure 1 for Q-Match: Iterative Shape Matching via Quantum Annealing

Figure 2 for Q-Match: Iterative Shape Matching via Quantum Annealing

Figure 3 for Q-Match: Iterative Shape Matching via Quantum Annealing

Figure 4 for Q-Match: Iterative Shape Matching via Quantum Annealing

Abstract:Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP) that becomes infeasible for shapes with high sampling density. A promising research direction is to tackle such quadratic optimization problems over binary variables with quantum annealing, which, in theory, allows to find globally optimal solutions relying on a new computational paradigm. Unfortunately, enforcing the linear equality constraints in QAPs via a penalty significantly limits the success probability of such methods on currently available quantum hardware. To address this limitation, this paper proposes Q-Match, i.e., a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm, which allows solving problems of an order of magnitude larger than current quantum methods. It works by implicitly enforcing the QAP constraints by updating the current estimates in a cyclic fashion. Further, Q-Match can be applied for shape matching problems iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems. Using the latest quantum annealer, the D-Wave Advantage, we evaluate the proposed method on a subset of QAPLIB as well as on isometric shape matching problems from the FAUST dataset.

* 16 pages, 12 figures and two tables; project page: http://gvv.mpi-inf.mpg.de/projects/QMATCH/

Via

Access Paper or Ask Questions

Real-time Deep Dynamic Characters

May 04, 2021

Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt

Figure 1 for Real-time Deep Dynamic Characters

Figure 2 for Real-time Deep Dynamic Characters

Figure 3 for Real-time Deep Dynamic Characters

Figure 4 for Real-time Deep Dynamic Characters

Abstract:We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance learned in a new weakly supervised way from multi-view imagery. In contrast to previous work, our controllable 3D character displays dynamics, e.g., the swing of the skirt, dependent on skeletal body motion in an efficient data-driven way, without requiring complex physics simulation. Our character model also features a learned dynamic texture model that accounts for photo-realistic motion-dependent appearance details, as well as view-dependent lighting effects. During training, we do not need to resort to difficult dynamic 3D capture of the human; instead we can train our model entirely from multi-view video in a weakly supervised manner. To this end, we propose a parametric and differentiable character representation which allows us to model coarse and fine dynamic deformations, e.g., garment wrinkles, as explicit space-time coherent mesh geometry that is augmented with high-quality dynamic textures dependent on motion and view point. As input to the model, only an arbitrary 3D skeleton motion is required, making it directly compatible with the established 3D animation pipeline. We use a novel graph convolutional network architecture to enable motion-dependent deformation learning of body and clothing, including dynamics, and a neural generative dynamic texture model creates corresponding dynamic texture maps. We show that by merely providing new skeletal motions, our model creates motion-dependent surface deformations, physically plausible dynamic clothing deformations, as well as video-realistic surface textures at a much higher level of detail than previous state of the art approaches, and even in real-time.

Via

Access Paper or Ask Questions

Neural Monocular 3D Human Motion Capture with Physical Awareness

May 03, 2021

Soshi Shimada, Vladislav Golyanik, Weipeng Xu, Patrick Pérez, Christian Theobalt

Figure 1 for Neural Monocular 3D Human Motion Capture with Physical Awareness

Figure 2 for Neural Monocular 3D Human Motion Capture with Physical Awareness

Figure 3 for Neural Monocular 3D Human Motion Capture with Physical Awareness

Figure 4 for Neural Monocular 3D Human Motion Capture with Physical Awareness

Abstract:We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub physionical, is aware of physical and environmental constraints. It combines in a fully differentiable way several key innovations, i.e., 1. a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2. an explicit rigid body dynamics model and 3. a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters -- both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are available at http://gvv.mpi-inf.mpg.de/projects/PhysAware/

Via

Access Paper or Ask Questions

Estimating Egocentric 3D Human Pose in Global Space

Apr 30, 2021

Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Christian Theobalt

Figure 1 for Estimating Egocentric 3D Human Pose in Global Space

Figure 2 for Estimating Egocentric 3D Human Pose in Global Space

Figure 3 for Estimating Egocentric 3D Human Pose in Global Space

Figure 4 for Estimating Egocentric 3D Human Pose in Global Space

Abstract:Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for traditional outside-in motion capture with external cameras. However, existing methods have several limitations. A prominent problem is that the estimated poses lie in the local coordinate system of the fisheye camera, rather than in the world coordinate system, which is restrictive for many applications. Furthermore, these methods suffer from limited accuracy and temporal instability due to ambiguities caused by the monocular setup and the severe occlusion in a strongly distorted egocentric perspective. To tackle these limitations, we present a new method for egocentric global 3D body pose estimation using a single head-mounted fisheye camera. To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset. Experimental results show that our approach outperforms state-of-the-art methods both quantitatively and qualitatively.

Via

Access Paper or Ask Questions

Differentiable Event Stream Simulator for Non-Rigid 3D Tracking

Apr 30, 2021

Jalees Nehvi, Vladislav Golyanik, Franziska Mueller, Hans-Peter Seidel, Mohamed Elgharib, Christian Theobalt

Figure 1 for Differentiable Event Stream Simulator for Non-Rigid 3D Tracking

Abstract:This paper introduces the first differentiable simulator of event streams, i.e., streams of asynchronous brightness change signals recorded by event cameras. Our differentiable simulator enables non-rigid 3D tracking of deformable objects (such as human hands, isometric surfaces and general watertight meshes) from event streams by leveraging an analysis-by-synthesis principle. So far, event-based tracking and reconstruction of non-rigid objects in 3D, like hands and body, has been either tackled using explicit event trajectories or large-scale datasets. In contrast, our method does not require any such processing or data, and can be readily applied to incoming event streams. We show the effectiveness of our approach for various types of non-rigid objects and compare to existing methods for non-rigid 3D tracking. In our experiments, the proposed energy-based formulations outperform competing RGB-based methods in terms of 3D errors. The source code and the new data are publicly available.

* In CVPR 2021 Workshop on Event-based Vision. Project page: http://gvv.mpi-inf.mpg.de/projects/Event-based_Non-rigid_3D_Tracking

Via

Access Paper or Ask Questions