Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alvaro Budria

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Apr 22, 2026

Dimitrije Antić, Alvaro Budria, George Paschalidis, Sai Kumar Dwivedi, Dimitrios Tzionas

Abstract:Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the subtle physical coupling between the body and objects. While current methods rely on sparse, binary contact cues, these fail to model the continuous proximity and dense spatial relationships that characterize natural interactions. We address this limitation via InterFields, a representation that encodes dense, continuous proximity across the entire body and object surfaces. However, inferring these fields from single images is inherently ill-posed. To tackle this, our intuition is that interaction patterns are characteristically structured by the action and object geometry. We capture this structure in LEXIS, a novel discrete manifold of interaction signatures learned via a VQ-VAE. We then develop LEXIS-Flow, a diffusion framework that leverages LEXIS signatures to estimate human and object meshes alongside their InterFields. Notably, these InterFields help in a guided refinement that ensures physically-plausible, proximity-aware reconstructions without requiring post-hoc optimization. Evaluation on Open3DHOI and BEHAVE shows that LEXIS-Flow significantly outperforms existing SotA baselines in reconstruction, contact, and proximity quality. Our approach not only improves generalization but also yields reconstructions perceived as more realistic, moving us closer to holistic 3D scene understanding. Code & models will be public at https://anticdimi.github.io/lexis.

* 26 pages, 11 figures, 4 tables. Project page: https://anticdimi.github.io/lexis

Via

Access Paper or Ask Questions

InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video

Nov 03, 2024

Alvaro Budria, Adrian Lopez-Rodriguez, Oscar Lorente, Francesc Moreno-Noguer

Figure 1 for InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video

Figure 2 for InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video

Figure 3 for InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video

Figure 4 for InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video

Abstract:We present InstantGeoAvatar, a method for efficient and effective learning from monocular video of detailed 3D geometry and appearance of animatable implicit human avatars. Our key observation is that the optimization of a hash grid encoding to represent a signed distance function (SDF) of the human subject is fraught with instabilities and bad local minima. We thus propose a principled geometry-aware SDF regularization scheme that seamlessly fits into the volume rendering pipeline and adds negligible computational overhead. Our regularization scheme significantly outperforms previous approaches for training SDFs on hash grids. We obtain competitive results in geometry reconstruction and novel view synthesis in as little as five minutes of training time, a significant reduction from the several hours required by previous work. InstantGeoAvatar represents a significant leap forward towards achieving interactive reconstruction of virtual avatars.

* Accepted as poster to Asian Conference on Computer Vison (ACCV 2024)

Via

Access Paper or Ask Questions

Topic Detection in Continuous Sign Language Videos

Sep 01, 2022

Alvaro Budria, Laia Tarres, Gerard I. Gallego, Francesc Moreno-Noguer, Jordi Torres, Xavier Giro-i-Nieto

Figure 1 for Topic Detection in Continuous Sign Language Videos

Figure 2 for Topic Detection in Continuous Sign Language Videos

Figure 3 for Topic Detection in Continuous Sign Language Videos

Figure 4 for Topic Detection in Continuous Sign Language Videos

Abstract:Significant progress has been made recently on challenging tasks in automatic sign language understanding, such as sign language recognition, translation and production. However, these works have focused on datasets with relatively few samples, short recordings and limited vocabulary and signing space. In this work, we introduce the novel task of sign language topic detection. We base our experiments on How2Sign, a large-scale video dataset spanning multiple semantic domains. We provide strong baselines for the task of topic detection and present a comparison between different visual features commonly used in the domain of sign language.

* "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop
* Presented as an extended abstract in the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop

Via

Access Paper or Ask Questions