Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhuoguang Chen

Complet4R: Geometric Complete 4D Reconstruction

Mar 28, 2026

Weibang Wang, Kenan Li, Zhuoguang Chen, Yijun Yuan, Hang Zhao

Abstract:We introduce Complet4R, a novel end-to-end framework for Geometric Complete 4D Reconstruction, which aims to recover temporally coherent and geometrically complete reconstruction for dynamic scenes. Our method formalizes the task of Geometric Complete 4D Reconstruction as a unified framework of reconstruction and completion, by directly accumulating full contexts onto each frame. Unlike previous approaches that rely on pairwise reconstruction or local motion estimation, Complet4R utilizes a decoder-only transformer to operate all context globally directly from sequential video input, reconstructing a complete geometry for every single timestamp, including occluded regions visible in other frames. Our method demonstrates the state-of-the-art performance on our proposed benchmark for Geometric Complete 4D Reconstruction and the 3D Point Tracking task. Code will be released to support future research.

Via

Access Paper or Ask Questions

TrackOcc: Camera-based 4D Panoptic Occupancy Tracking

Mar 11, 2025

Zhuoguang Chen, Kenan Li, Xiuyu Yang, Tao Jiang, Yiming Li, Hang Zhao

Abstract:Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input. Furthermore, we propose TrackOcc, a cutting-edge approach that processes image inputs in a streaming, end-to-end manner with 4D panoptic queries to address the proposed task. Leveraging the localization-aware loss, TrackOcc enhances the accuracy of 4D panoptic occupancy tracking without bells and whistles. Experimental results demonstrate that our method achieves state-of-the-art performance on the Waymo dataset. The source code will be released at https://github.com/Tsinghua-MARS-Lab/TrackOcc.

* Accepted at ICRA 2025

Via

Access Paper or Ask Questions

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Nov 01, 2023

Yiran Guan, Zhuoguang Chen, Wenzheng Zeng, Zhiguo Cao, Yang Xiao

Figure 1 for End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Figure 2 for End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Figure 3 for End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Figure 4 for End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Abstract:In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.

* 5 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions