Alert button
Picture for Cristian Sminchisescu

Cristian Sminchisescu

Alert button

SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling

Nov 04, 2023
Eduard Gabriel Bazavan, Andrei Zanfir, Thiemo Alldieck, Teodor Alexandru Szente, Mihai Zanfir, Cristian Sminchisescu

We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \emph{complete} model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.

* To be published at the International Conference on 3D Vision 2024 
Viaarxiv icon

Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction

Sep 11, 2023
Ivan Grishchenko, Geng Yan, Eduard Gabriel Bazavan, Andrei Zanfir, Nikolai Chinaev, Karthik Raveendran, Matthias Grundmann, Cristian Sminchisescu

We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time model that predicts blendshape coefficients based on facial landmarks.

* 4 pages, 3 figures 
Viaarxiv icon

Reconstructing Three-Dimensional Models of Interacting Humans

Aug 04, 2023
Mihai Fieraru, Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Vlad Olaru, Cristian Sminchisescu

Figure 1 for Reconstructing Three-Dimensional Models of Interacting Humans
Figure 2 for Reconstructing Three-Dimensional Models of Interacting Humans
Figure 3 for Reconstructing Three-Dimensional Models of Interacting Humans
Figure 4 for Reconstructing Three-Dimensional Models of Interacting Humans

Understanding 3d human interactions is fundamental for fine-grained scene analysis and behavioural modeling. However, most of the existing models predict incorrect, lifeless 3d estimates, that miss the subtle human contact aspects--the essence of the event--and are of little use for detailed behavioral understanding. This paper addresses such issues with several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged to ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing $2,525$ contact events, $728,664$ ground truth 3d poses, as well as FlickrCI3D, a dataset of $11,216$ images, with $14,081$ processed pairs of people, and $81,233$ facet-level surface correspondences. Finally, (4) we propose methodology for recovering the ground-truth pose and shape of interacting people in a controlled setup and (5) annotate all 3d interaction motions in CHI3D with textual descriptions. Motion data in multiple formats (GHUM and SMPLX parameters, Human3.6m 3d joints) is made available for research purposes at \url{https://ci3d.imar.ro}, together with an evaluation server and a public benchmark.

Viaarxiv icon

DreamHuman: Animatable 3D Avatars from Text

Jun 15, 2023
Nikos Kolotouros, Thiemo Alldieck, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Fieraru, Cristian Sminchisescu

Figure 1 for DreamHuman: Animatable 3D Avatars from Text
Figure 2 for DreamHuman: Animatable 3D Avatars from Text
Figure 3 for DreamHuman: Animatable 3D Avatars from Text
Figure 4 for DreamHuman: Animatable 3D Avatars from Text

We present DreamHuman, a method to generate realistic animatable 3D human avatar models solely from textual descriptions. Recent text-to-3D methods have made considerable strides in generation, but are still lacking in important aspects. Control and often spatial resolution remain limited, existing methods produce fixed rather than animated 3D human models, and anthropometric consistency for complex structures like people remains a challenge. DreamHuman connects large text-to-image synthesis models, neural radiance fields, and statistical human body models in a novel modeling and optimization framework. This makes it possible to generate dynamic 3D human avatars with high-quality textures and learned, instance-specific, surface deformations. We demonstrate that our method is capable to generate a wide variety of animatable, realistic 3D human models from text. Our 3D models have diverse appearance, clothing, skin tones and body shapes, and significantly outperform both generic text-to-3D approaches and previous text-based 3D avatar generators in visual fidelity. For more results and animations please check our website at https://dream-human.github.io.

* Project website at https://dream-human.github.io/ 
Viaarxiv icon

HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

Dec 15, 2022
Andrei Zanfir, Mihai Zanfir, Alexander Gorban, Jingwei Ji, Yin Zhou, Dragomir Anguelov, Cristian Sminchisescu

Figure 1 for HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving
Figure 2 for HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving
Figure 3 for HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving
Figure 4 for HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.

* Published at the 6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand 
Viaarxiv icon

PhoMoH: Implicit Photorealistic 3D Models of Human Heads

Dec 14, 2022
Mihai Zanfir, Thiemo Alldieck, Cristian Sminchisescu

Figure 1 for PhoMoH: Implicit Photorealistic 3D Models of Human Heads
Figure 2 for PhoMoH: Implicit Photorealistic 3D Models of Human Heads
Figure 3 for PhoMoH: Implicit Photorealistic 3D Models of Human Heads
Figure 4 for PhoMoH: Implicit Photorealistic 3D Models of Human Heads

We present PhoMoH, a neural network methodology to construct generative models of photorealistic 3D geometry and appearance of human heads including hair, beards, clothing and accessories. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photorealistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and allow the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.

Viaarxiv icon

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

Dec 13, 2022
Enric Corona, Mihai Zanfir, Thiemo Alldieck, Eduard Gabriel Bazavan, Andrei Zanfir, Cristian Sminchisescu

Figure 1 for Structured 3D Features for Reconstructing Relightable and Animatable Avatars
Figure 2 for Structured 3D Features for Reconstructing Relightable and Animatable Avatars
Figure 3 for Structured 3D Features for Reconstructing Relightable and Animatable Avatars
Figure 4 for Structured 3D Features for Reconstructing Relightable and Animatable Avatars

We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.

* Project page: https://enriccorona.github.io/s3f/ , Video: https://www.youtube.com/watch?v=mcZGcQ6L-2s 
Viaarxiv icon

Transformer-Based Learned Optimization

Dec 02, 2022
Erik Gärtner, Luke Metz, Mykhaylo Andriluka, C. Daniel Freeman, Cristian Sminchisescu

Figure 1 for Transformer-Based Learned Optimization
Figure 2 for Transformer-Based Learned Optimization
Figure 3 for Transformer-Based Learned Optimization
Figure 4 for Transformer-Based Learned Optimization

In this paper, we propose a new approach to learned optimization. As common in the literature, we represent the computation of the update step of the optimizer with a neural network. The parameters of the optimizer are then learned on a set of training optimization tasks, in order to perform minimisation efficiently. Our main innovation is to propose a new neural network architecture for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization approaches, our formulation allows for conditioning across different dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for evaluation of optimization algorithms, as well as on the real world-task of physics-based reconstruction of articulated 3D human motion.

Viaarxiv icon

BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation

Jun 23, 2022
Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich, Matthias Grundmann, Cristian Sminchisescu

Figure 1 for BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation
Figure 2 for BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation
Figure 3 for BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation
Figure 4 for BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation

We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, ii) updated 3D body tracking with additional hand landmarks and iii) full body pose estimation from a monocular image.

* 4 pages, 4 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, New Orleans, LA, 2022 
Viaarxiv icon