Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sai Tanmay Reddy Chakkera

Poppy: Polarization-based Plug-and-Play Guidance for Enhancing Monocular Normal Estimation

Mar 29, 2026

Irene Kim, Sai Tanmay Reddy Chakkera, Alexandros Graikos, Dimitris Samaras, Akshat Dave

Abstract:Monocular surface normal estimators trained on large-scale RGB-normal data often perform poorly in the edge cases of reflective, textureless, and dark surfaces. Polarization encodes surface orientation independently of texture and albedo, offering a physics-based complement for these cases. Existing polarization methods, however, require multi-view capture or specialized training data, limiting generalization. We introduce Poppy, a training-free framework that refines normals from any frozen RGB backbone using single-shot polarization measurements at test time. Keeping backbone weights frozen, Poppy optimizes per-pixel offsets to the input RGB and output normal along with a learned reflectance decomposition. A differentiable rendering layer converts the refined normals into polarization predictions and penalizes mismatches with the observed signal. Across seven benchmarks and three backbone architectures (diffusion, flow, and feed-forward), Poppy reduces mean angular error by 23-26% on synthetic data and 6-16% on real data. These results show that guiding learned RGB-based normal estimators with polarization cues at test time refines normals on challenging surfaces without retraining.

* project page: https://irnkim.github.io/poppy/

Via

Access Paper or Ask Questions

JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Sep 18, 2024

Sai Tanmay Reddy Chakkera, Aggelina Chatziagapi, Dimitris Samaras

Figure 1 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Figure 2 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Figure 3 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Figure 4 for JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

Abstract:We introduce a novel method for joint expression and audio-guided talking face generation. Recent approaches either struggle to preserve the speaker identity or fail to produce faithful facial expressions. To address these challenges, we propose a NeRF-based network. Since we train our network on monocular videos without any ground truth, it is essential to learn disentangled representations for audio and expression. We first learn audio features in a self-supervised manner, given utterances from multiple subjects. By incorporating a contrastive learning technique, we ensure that the learned audio features are aligned to the lip motion and disentangled from the muscle motion of the rest of the face. We then devise a transformer-based architecture that learns expression features, capturing long-range facial expressions and disentangling them from the speech-specific mouth movements. Through quantitative and qualitative evaluation, we demonstrate that our method can synthesize high-fidelity talking face videos, achieving state-of-the-art facial expression transfer along with lip synchronization to unseen audio.

* Accepted by BMVC 2024. Project Page: https://starc52.github.io/publications/2024-07-19-JEAN

Via

Access Paper or Ask Questions