Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:End-to-End Spatial-Temporal Transformer for Real-time 4D HOI Reconstruction

Mar 15, 2026

Haoyu Zhang, Wei Zhai, Yuhang Yang, Yang Cao, Zheng-Jun Zha

Share this with someone who'll enjoy it:

Abstract:Monocular 4D human-object interaction (HOI) reconstruction - recovering a moving human and a manipulated object from a single RGB video - remains challenging due to depth ambiguity and frequent occlusions. Existing methods often rely on multi-stage pipelines or iterative optimization, leading to high inference latency, failing to meet real-time requirements, and susceptibility to error accumulation. To address these limitations, we propose THO, an end-to-end Spatial-Temporal Transformer that predicts human motion and coordinated object motion in a forward fashion from the given video and 3D template. THO achieves this by leveraging spatial-temporal HOI tuple priors. Spatial priors exploit contact-region proximity to infer occluded object features from human cues, while temporal priors capture cross-frame kinematic correlations to refine object representations and enforce physical coherence. Extensive experiments demonstrate that THO operates at an inference speed of 31.5 FPS on a single RTX 4090 GPU, achieving a >600x speedup over prior optimization-based methods while simultaneously improving reconstruction accuracy and temporal consistency. The project page is available at: https://nianheng.github.io/THO-project/

* 23 pages, 7 figures. The project page is available at: https://nianheng.github.io/THO-project/

View paper on

Share this with someone who'll enjoy it:

Title:End-to-End Spatial-Temporal Transformer for Real-time 4D HOI Reconstruction

Paper and Code